Harnessing multimodal approaches for depression detection using large language models and facial expressions

Sadeghi M, Richer R, Egger B, Schindler-Gmelch L, Rupp L, Rahimi F, Berking M, Eskofier B (2024)

Publication Language: English

Publication Type: Journal article, Original article

Publication year: 2024

Journal

npj Mental Health Research Springer Nature

Book Volume: 3

Pages Range: 66

Journal Issue: 1

URI: https://www.nature.com/articles/s44184-024-00112-8

DOI: 10.1038/s44184-024-00112-8

Open Access Link: https://www.nature.com/articles/s44184-024-00112-8

Abstract

Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis.

Authors with CRIS profile

Misha Sadeghi Sonderforschungsbereich 1483/1 - Empathokinästhetische Sensorik - Sensortechniken und Datenanalyseverfahren zur empathokinästhetischen Modellbildung und Zustandsbestimmung (SFB EmpkinS) Robert Richer Sonderforschungsbereich 1483/1 - Empathokinästhetische Sensorik - Sensortechniken und Datenanalyseverfahren zur empathokinästhetischen Modellbildung und Zustandsbestimmung (SFB EmpkinS) Bernhard Egger Juniorprofessur für Cognitive Computer Vision (Stiftungsprofessur) Lena Marie Gmelch Lehrstuhl für Klinische Psychologie und Psychotherapie (KliPs) Lydia Rupp Lehrstuhl für Klinische Psychologie und Psychotherapie (KliPs) Farnaz Rahimi Machine Learning and Data Analytics Lab Matthias Berking Lehrstuhl für Klinische Psychologie und Psychotherapie (KliPs) Björn Eskofier Lehrstuhl für Informatik 14 (Machine Learning and Data Analytics)

Related research project(s)

Empathokinästhetische Sensorik für Biofeedback bei depressiven Patienten (SFB 1483 EmpkinS D02) Empatho-Kinaesthetic Sensor Technology (SFB 1483 EmpkinS) July 1, 2021 - June 30, 2025

How to cite

APA:

Sadeghi, M., Richer, R., Egger, B., Schindler-Gmelch, L., Rupp, L., Rahimi, F.,... Eskofier, B. (2024). Harnessing multimodal approaches for depression detection using large language models and facial expressions. npj Mental Health Research, 3(1), 66. https://doi.org/10.1038/s44184-024-00112-8

MLA:

Sadeghi, Misha, et al. "Harnessing multimodal approaches for depression detection using large language models and facial expressions." npj Mental Health Research 3.1 (2024): 66.

BibTeX: Download