INFERRING SOCIAL CONTEXTS FROM AUDIO RECORDINGS USING DEEP NEURAL NETWORKS

Meysam Asgari; Izhak Shafran; Alireza Bayestehtashk

doi:10.1109/MLSP.2014.6958853

INFERRING SOCIAL CONTEXTS FROM AUDIO RECORDINGS USING DEEP NEURAL NETWORKS

IEEE Int Workshop Mach Learn Signal Process. 2014 Sep:2014:10.1109/MLSP.2014.6958853. doi: 10.1109/MLSP.2014.6958853. Epub 2014 Nov 20.

Authors

Meysam Asgari¹, Izhak Shafran¹, Alireza Bayestehtashk¹

Affiliation

¹ Center for Spoken Language Understanding, Oregon Health & Science University.

Abstract

In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.

Keywords: Deep neural networks; Harmonic model; Multi-label classification.

Grants and funding

K25 AG033723/AG/NIA NIH HHS/United States