Dimensional Measures of Psychopathology in Children and Adolescents Using Large Language Models

Thomas H McCoy Jr; Roy H Perlis

doi:10.1016/j.biopsych.2024.05.008

Dimensional Measures of Psychopathology in Children and Adolescents Using Large Language Models

Biol Psychiatry. 2024 Dec 15;96(12):940-947. doi: 10.1016/j.biopsych.2024.05.008. Epub 2024 Jun 10.

Authors

Thomas H McCoy Jr¹, Roy H Perlis²

Affiliations

¹ Center for Quantitative Health and Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts; Department of Psychiatry, Harvard Medical School, Boston, Massachusetts.
² Center for Quantitative Health and Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts; Department of Psychiatry, Harvard Medical School, Boston, Massachusetts. Electronic address: [email protected].

PMID: 38866172
DOI: 10.1016/j.biopsych.2024.05.008

Abstract

Background: To enable greater use of National Institute of Mental Health Research Domain Criteria (RDoC) in real-world settings, we applied large language models (LLMs) to estimate dimensional psychopathology from narrative clinical notes.

Methods: We conducted a cohort study using health records from individuals age ≤18 years evaluated in the psychiatric emergency department of a large academic medical center between November 2008 and March 2015. Outcomes were hospital admission and length of emergency department stay. RDoC domains were estimated using a Health Insurance Portability and Accountability Act-compliant LLM (gpt-4-1106-preview) and compared with a previously validated token-based approach.

Results: The cohort included 3059 individuals (median age 16 years [interquartile range, 13-18]; 1580 [52%] female, 1479 [48%] male; 105 [3.4%] identified as Asian, 329 [11%] as Black, 288 [9.4%] as Hispanic, 474 [15%] as other race, and 1863 [61%] as White), of whom 1695 (55%) were admitted. Correlation between LLM-extracted RDoC scores and the token-based scores ranged from small to medium as assessed by Kendall's tau (0.14-0.22). In logistic regression models adjusting for sociodemographic and clinical features, admission likelihood was associated with greater scores on all domains, with the exception of the sensorimotor domain, which was inversely associated (p < .001 for all adjusted associations). Tests for bias suggested modest but statistically significant differences in positive valence scores by race (p < .05 for Asian, Black, and Hispanic individuals).

Conclusions: An LLM extracted estimates of 6 RDoC domains in an explainable manner, which were associated with clinical outcomes. This approach can contribute to a new generation of prediction models or biological investigations based on dimensional psychopathology.

Keywords: Anxiety; Artificial intelligence; Deep learning; Depression; Machine learning; Research domain criteria.

MeSH terms

Adolescent
Child
Cohort Studies
Emergency Service, Hospital
Female
Hospitalization / statistics & numerical data
Humans
Language
Length of Stay / statistics & numerical data
Male
Mental Disorders* / diagnosis
National Institute of Mental Health (U.S.)
Psychopathology
United States

Grants and funding

U01 MH136059/MH/NIMH NIH HHS/United States