Measuring algorithmic bias to analyze the reliability of AI tools that predict depression risk using smartphone sensed-behavioral data

Daniel A Adler; Caitlin A Stamatis; Jonah Meyerhoff; David C Mohr; Fei Wang; Gabriel J Aranovich; Srijan Sen; Tanzeem Choudhury

doi:10.21203/rs.3.rs-3044613/v1

Measuring algorithmic bias to analyze the reliability of AI tools that predict depression risk using smartphone sensed-behavioral data

Res Sq [Preprint]. 2024 Apr 22:rs.3.rs-3044613. doi: 10.21203/rs.3.rs-3044613/v1.

Authors

Daniel A Adler, Caitlin A Stamatis, Jonah Meyerhoff, David C Mohr, Fei Wang, Gabriel J Aranovich, Srijan Sen, Tanzeem Choudhury

Abstract

AI tools intend to transform mental healthcare by providing remote estimates of depression risk using behavioral data collected by sensors embedded in smartphones. While these tools accurately predict elevated symptoms in small, homogenous populations, recent studies show that these tools are less accurate in larger, more diverse populations. In this work, we show that accuracy is reduced because sensed-behaviors are unreliable predictors of depression across individuals; specifically the sensed-behaviors that predict depression risk are inconsistent across demographic and socioeconomic subgroups. We first identified subgroups where a developed AI tool underperformed by measuring algorithmic bias, where subgroups with depression were incorrectly predicted to be at lower risk than healthier subgroups. We then found inconsistencies between sensed-behaviors predictive of depression across these subgroups. Our findings suggest that researchers developing AI tools predicting mental health from behavior should think critically about the generalizability of these tools, and consider tailored solutions for targeted populations.

Publication types

Preprint

Grants and funding

R01 MH101459/MH/NIMH NIH HHS/United States