Scalable incident detection via natural language processing and probabilistic language models

Colin G Walsh; Drew Wilimitis; Qingxia Chen; Aileen Wright; Jhansi Kolli; Katelyn Robinson; Michael A Ripperger; Kevin B Johnson; David Carrell; Rishi J Desai; Andrew Mosholder; Sai Dharmarajan; Sruthi Adimadhyam; Daniel Fabbri; Danijela Stojanovic; Michael E Matheny; Cosmin A Bejan

doi:10.1038/s41598-024-72756-7

Scalable incident detection via natural language processing and probabilistic language models

Sci Rep. 2024 Oct 8;14(1):23429. doi: 10.1038/s41598-024-72756-7.

Authors

Colin G Walsh^{1

2

3

4}, Drew Wilimitis⁵, Qingxia Chen^{5

6}, Aileen Wright⁵, Jhansi Kolli⁵, Katelyn Robinson⁵, Michael A Ripperger⁵, Kevin B Johnson^{7

8

9}, David Carrell¹⁰, Rishi J Desai¹¹, Andrew Mosholder^{12

13}, Sai Dharmarajan^{12

14}, Sruthi Adimadhyam¹⁵, Daniel Fabbri⁵, Danijela Stojanovic^{12

13}, Michael E Matheny⁵, Cosmin A Bejan⁵

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. [email protected].
² Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. [email protected].
³ Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA. [email protected].
⁴ Vanderbilt University Medical Center, Nashville, USA. [email protected].
⁵ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
⁶ Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁷ Department of Biostatistics, Epidemiology and Informatics, and Pediatrics, University of Pennsylvania, Pennsylvania, USA.
⁸ Department of Computer and Information Science, Bioengineering, University of Pennsylvania, Pennsylvania, USA.
⁹ Department of Science Communication, University of Pennsylvania, Pennsylvania, USA.
¹⁰ Washington Health Research Institute, , Kaiser Permanente Washington, Washington, USA.
¹¹ Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA.
¹² Center for Drug Evaluation and Research, United States Food and Drug Administration, Maryland, USA.
¹³ Office of Surveillance and Epidemiology, United States Food and Drug Administration, Maryland, USA.
¹⁴ Office of Translational Science, United States Food and Drug Administration, Maryland, USA.
¹⁵ Department of Population Medicine, Harvard Medical School, Harvard Pilgrim Health Care Institute, Boston, USA.

Abstract

Post marketing safety surveillance depends in part on the ability to detect concerning clinical events at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires awareness and understanding among healthcare professionals to achieve its potential. Reliance on readily available structured data such as diagnostic codes risks under-coding and imprecision. Clinical textual data might bridge these gaps, and natural language processing (NLP) has been shown to aid in scalable phenotyping across healthcare records in multiple clinical domains. In this study, we developed and validated a novel incident phenotyping approach using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It's based on a published, validated approach (PheRe) used to ascertain social determinants of health and suicidality across entire healthcare records. To demonstrate generalizability, we validated this approach on two separate phenotypes that share common challenges with respect to accurate ascertainment: (1) suicide attempt; (2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide attempt and sleep-related behaviors, respectively, we conducted silver standard (diagnostic coding) and gold standard (manual chart review) validation. We showed Area Under the Precision-Recall Curve of ~ 0.77 (95% CI 0.75-0.78) for suicide attempt and AUPR ~ 0.31 (95% CI 0.28-0.34) for sleep-related behaviors. We also evaluated performance by coded race and demonstrated differences in performance by race differed across phenotypes. Scalable phenotyping models, like most healthcare AI, require algorithmovigilance and debiasing prior to implementation.

MeSH terms

Adult
Electronic Health Records*
Female
Humans
Male
Middle Aged
Models, Statistical
Natural Language Processing*
Suicide, Attempted

Abstract

MeSH terms

Grants and funding