Multi-level Attention network using text, audio and video for Depression Prediction

Ray, Anupama; Kumar, Siddharth; Reddy, Rutvik; Mukherjee, Prerana; Garg, Ritu

Computer Science > Computer Vision and Pattern Recognition

arXiv:1909.01417v1 (cs)

[Submitted on 3 Sep 2019]

Title:Multi-level Attention network using text, audio and video for Depression Prediction

Authors:Anupama Ray, Siddharth Kumar, Rutvik Reddy, Prerana Mukherjee, Ritu Garg

View PDF

Abstract:Depression has been the leading cause of mental-health illness worldwide. Major depressive disorder (MDD), is a common mental health disorder that affects both psychologically as well as physically which could lead to loss of lives. Due to the lack of diagnostic tests and subjectivity involved in detecting depression, there is a growing interest in using behavioural cues to automate depression diagnosis and stage prediction. The absence of labelled behavioural datasets for such problems and the huge amount of variations possible in behaviour makes the problem more challenging. This paper presents a novel multi-level attention based network for multi-modal depression prediction that fuses features from audio, video and text modalities while learning the intra and inter modality relevance. The multi-level attention reinforces overall learning by selecting the most influential features within each modality for the decision making. We perform exhaustive experimentation to create different regression models for audio, video and text modalities. Several fusions models with different configurations are constructed to understand the impact of each feature and modality. We outperform the current baseline by 17.52% in terms of root mean squared error.

Comments:	in Proceedings of the 9th International Workshop on Audio/Visual Emotion Challenge, AVEC 2019, ACM Multimedia Workshop, Nice, France
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1909.01417 [cs.CV]
	(or arXiv:1909.01417v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1909.01417

Submission history

From: Prerana Mukherjee [view email]
[v1] Tue, 3 Sep 2019 19:40:38 UTC (342 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-level Attention network using text, audio and video for Depression Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-level Attention network using text, audio and video for Depression Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators