Enhancing Precision in Detecting Severe Immune-Related Adverse Events: Comparative Analysis of Large Language Models and International Classification of Disease Codes in Patient Records

Virginia H Sun; Julius C Heemelaar; Ibrahim Hadzic; Vineet K Raghu; Chia-Yun Wu; Leyre Zubiri; Azin Ghamari; Nicole R LeBoeuf; Osama Abu-Shawer; Kenneth L Kehl; Shilpa Grover; Prabhsimranjot Singh; Giselle A Suero-Abreu; Jessica Wu; Ayo S Falade; Kelley Grealish; Molly F Thomas; Nora Hathaway; Benjamin D Medoff; Hannah K Gilman; Alexandra-Chloe Villani; Jor Sam Ho; Meghan J Mooradian; Meghan E Sise; Daniel A Zlotoff; Steven M Blum; Michael Dougan; Ryan J Sullivan; Tomas G Neilan; Kerry L Reynolds

doi:10.1200/JCO.24.00326

Enhancing Precision in Detecting Severe Immune-Related Adverse Events: Comparative Analysis of Large Language Models and International Classification of Disease Codes in Patient Records

J Clin Oncol. 2024 Dec 10;42(35):4134-4144. doi: 10.1200/JCO.24.00326. Epub 2024 Sep 3.

Authors

Virginia H Sun^{1

2}, Julius C Heemelaar^{1

2

3}, Ibrahim Hadzic^{1

4

5

6}, Vineet K Raghu^{1

2}, Chia-Yun Wu^{7

8}, Leyre Zubiri^{1

7}, Azin Ghamari^{1

2}, Nicole R LeBoeuf^{1

9

10}, Osama Abu-Shawer¹¹, Kenneth L Kehl^{1

12

13}, Shilpa Grover^{1

14}, Prabhsimranjot Singh^{1

13}, Giselle A Suero-Abreu^{1

2

15}, Jessica Wu^{1

2}, Ayo S Falade¹⁶, Kelley Grealish⁷, Molly F Thomas^{17

18

19}, Nora Hathaway⁷, Benjamin D Medoff^{1

20}, Hannah K Gilman^{1

2}, Alexandra-Chloe Villani^{1

21

22}, Jor Sam Ho^{1

2}, Meghan J Mooradian^{1

7}, Meghan E Sise^{1

23}, Daniel A Zlotoff^{1

15}, Steven M Blum^{1

7

21

22}, Michael Dougan^{1

24}, Ryan J Sullivan^{1

7}, Tomas G Neilan^{1

2

15}, Kerry L Reynolds^{1

7}

Affiliations

¹ Harvard Medical School, Boston, MA.
² Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA.
³ Leiden University Medical Center, Leiden, the Netherlands.
⁴ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Boston, MA.
⁵ Brigham and Women's Hospital, Boston, MA.
⁶ Maastricht University, Maastricht, the Netherlands.
⁷ Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA.
⁸ Far Eastern Memorial Hospital, New Taipei City, Taiwan.
⁹ Department of Dermatology, Brigham and Women's Hospital, Boston, MA.
¹⁰ Center for Cutaneous Oncology, Dana-Farber Cancer Institute, Boston, MA.
¹¹ Department of Internal Medicine, Cleveland Clinic, Cleveland, OH.
¹² Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA.
¹³ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA.
¹⁴ Division of Gastroenterology, Hepatology, and Endoscopy, Brigham and Women's Hospital, Boston, MA.
¹⁵ Division of Cardiology, Massachusetts General Hospital, Boston, MA.
¹⁶ Internal Medicine Department, Massachusetts General Brigham Salem Hospital, Salem, MA.
¹⁷ Division of Gastroenterology, Oregon Health and Science University, Portland, OR.
¹⁸ Department of Medicine, Oregon Health and Science University, Portland, OR.
¹⁹ Department of Cell, Developmental, and Cancer Biology, Oregon Health and Science University, Portland, OR.
²⁰ Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA.
²¹ Center for Immunology and Inflammatory Diseases (CIID), Massachusetts General Hospital Krantz Family Center for Cancer Research, Boston, MA.
²² Broad Institute of MIT and Harvard, Cambridge, MA.
²³ Division of Nephrology, Massachusetts General Hospital, Boston, MA.
²⁴ Division of Gastroenterology, Massachusetts General Hospital, Boston, MA.

PMID: 39226489
DOI: 10.1200/JCO.24.00326

Abstract

Purpose: Current approaches to accurately identify immune-related adverse events (irAEs) in large retrospective studies are limited. Large language models (LLMs) offer a potential solution to this challenge, given their high performance in natural language comprehension tasks. Therefore, we investigated the use of an LLM to identify irAEs among hospitalized patients, comparing its performance with manual adjudication and International Classification of Disease (ICD) codes.

Methods: Hospital admissions of patients receiving immune checkpoint inhibitor (ICI) therapy at a single institution from February 5, 2011, to September 5, 2023, were individually reviewed and adjudicated for the presence of irAEs. ICD codes and an LLM with retrieval-augmented generation were applied to detect frequent irAEs (ICI-induced colitis, hepatitis, and pneumonitis) and the most fatal irAE (ICI-myocarditis) from electronic health records. The performance between ICD codes and LLM was compared via sensitivity and specificity with an α = .05, relative to the gold standard of manual adjudication. External validation was performed using a data set of hospital admissions from June 1, 2018, to May 31, 2019, from a second institution.

Results: Of the 7,555 admissions for patients on ICI therapy in the initial cohort, 2.0% were adjudicated to be due to ICI-colitis, 1.1% ICI-hepatitis, 0.7% ICI-pneumonitis, and 0.8% ICI-myocarditis. The LLM demonstrated higher sensitivity than ICD codes (94.7% v 68.7%), achieving significance for ICI-hepatitis (P < .001), myocarditis (P < .001), and pneumonitis (P = .003) while yielding similar specificities (93.7% v 92.4%). The LLM spent an average of 9.53 seconds/chart in comparison with an estimated 15 minutes for adjudication. In the validation cohort (N = 1,270), the mean LLM sensitivity and specificity were 98.1% and 95.7%, respectively.

Conclusion: LLMs are a useful tool for the detection of irAEs, outperforming ICD codes in sensitivity and adjudication in efficiency.

Publication types

Comparative Study

MeSH terms

Aged
Electronic Health Records
Female
Humans
Immune Checkpoint Inhibitors* / adverse effects
International Classification of Diseases*
Male
Middle Aged
Natural Language Processing
Neoplasms / drug therapy
Neoplasms / immunology
Retrospective Studies

Substances

Immune Checkpoint Inhibitors

Grants and funding

K08 DK127246/DK/NIDDK NIH HHS/United States