Detecting and tracking depression through temporal topic modeling of tweets: insights from a 180-day study

Npj Ment Health Res. 2024 Dec 6;3(1):62. doi: 10.1038/s44184-024-00107-5.

Abstract

Depression affects over 280 million people globally, yet many cases remain undiagnosed or untreated due to stigma and lack of awareness. Social media platforms like X (formerly Twitter) offer a way to monitor and analyze depression markers. This study analyzes Twitter data 90 days before and 90 days after a self-disclosed clinical diagnosis. We gathered 246,637 tweets from 229 diagnosed users. CorEx topic modeling identified seven themes: causes, physical symptoms, mental symptoms, swear words, treatment, coping/support mechanisms, and lifestyle, and conditional logistic regression assessed the odds of these themes occurring post-diagnosis. A control group of healthy users (284,772 tweets) was used to develop and evaluate machine learning classifiers-support vector machines, naive Bayes, and logistic regression-to distinguish between depressed and non-depressed users. Logistic regression and SVM performed best. These findings show the potential of Twitter data for tracking depression and changes in symptoms, coping mechanisms, and treatment use.