Outcome Prediction Using Multi-Modal Information: Integrating Large Language Model-Extracted Clinical Information and Image Analysis

Di Sun; Lubomir Hadjiiski; John Gormley; Heang-Ping Chan; Elaine Caoili; Richard Cohan; Ajjai Alva; Grace Bruno; Rada Mihalcea; Chuan Zhou; Vikas Gulani

doi:10.3390/cancers16132402

Outcome Prediction Using Multi-Modal Information: Integrating Large Language Model-Extracted Clinical Information and Image Analysis

Cancers (Basel). 2024 Jun 29;16(13):2402. doi: 10.3390/cancers16132402.

Authors

Affiliations

¹ Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA.
² Department of Internal Medicine-Hematology/Oncology, University of Michigan, Ann Arbor, MI 48109, USA.
³ Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA.

Abstract

Survival prediction post-cystectomy is essential for the follow-up care of bladder cancer patients. This study aimed to evaluate artificial intelligence (AI)-large language models (LLMs) for extracting clinical information and improving image analysis, with an initial application involving predicting five-year survival rates of patients after radical cystectomy for bladder cancer. Data were retrospectively collected from medical records and CT urograms (CTUs) of bladder cancer patients between 2001 and 2020. Of 781 patients, 163 underwent chemotherapy, had pre- and post-chemotherapy CTUs, underwent radical cystectomy, and had an available post-surgery five-year survival follow-up. Five AI-LLMs (Dolly-v2, Vicuna-13b, Llama-2.0-13b, GPT-3.5, and GPT-4.0) were used to extract clinical descriptors from each patient's medical records. As a reference standard, clinical descriptors were also extracted manually. Radiomics and deep learning descriptors were extracted from CTU images. The developed multi-modal predictive model, CRD, was based on the clinical (C), radiomics (R), and deep learning (D) descriptors. The LLM retrieval accuracy was assessed. The performances of the survival predictive models were evaluated using AUC and Kaplan-Meier analysis. For the 163 patients (mean age 64 ± 9 years; M:F 131:32), the LLMs achieved extraction accuracies of 74%~87% (Dolly), 76%~83% (Vicuna), 82%~93% (Llama), 85%~91% (GPT-3.5), and 94%~97% (GPT-4.0). For a test dataset of 64 patients, the CRD model achieved AUCs of 0.89 ± 0.04 (manually extracted information), 0.87 ± 0.05 (Dolly), 0.83 ± 0.06~0.84 ± 0.05 (Vicuna), 0.81 ± 0.06~0.86 ± 0.05 (Llama), 0.85 ± 0.05~0.88 ± 0.05 (GPT-3.5), and 0.87 ± 0.05~0.88 ± 0.05 (GPT-4.0). This study demonstrates the use of LLM model-extracted clinical information, in conjunction with imaging analysis, to improve the prediction of clinical outcomes, with bladder cancer as an initial example.

Keywords: bladder cancer; deep learning; large language models; radiomics; survival prediction.

Grants and funding

U01-CA232931/GF/NIH HHS/United States