Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports

Na Yeon Han; Keewon Shin; Min Ju Kim; Beom Jin Park; Ki Choon Sim; Yeo Eun Han; Deuk Jae Sung; Jae Woong Choi; Suk Keu Yeom

doi:10.1016/j.acra.2024.10.050

Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports

Acad Radiol. 2024 Dec 9:S1076-6332(24)00837-7. doi: 10.1016/j.acra.2024.10.050. Online ahead of print.

Authors

Na Yeon Han¹, Keewon Shin², Min Ju Kim³, Beom Jin Park¹, Ki Choon Sim¹, Yeo Eun Han¹, Deuk Jae Sung¹, Jae Woong Choi⁴, Suk Keu Yeom⁵

Affiliations

¹ Department of Radiology, Korea University Anam Hospital, Korea University College of Medicine, 73 Goryeodae-ro, Seongbuk-gu, Seoul, Republic of Korea (N.Y.H., M.J.K., B.J.P., K.C.S., Y.E.H., D.J.S.).
² Center for AI and Digital Healthcare Research, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Republic of Korea (K.S.).
³ Department of Radiology, Korea University Anam Hospital, Korea University College of Medicine, 73 Goryeodae-ro, Seongbuk-gu, Seoul, Republic of Korea (N.Y.H., M.J.K., B.J.P., K.C.S., Y.E.H., D.J.S.). Electronic address: [email protected].
⁴ Department of Radiology, Korea University Guro Hospital, Korea University College of Medicine, 148 Gurodong-ro, Guro-gu, Seoul, Republic of Korea (J.W.C.).
⁵ Department of Radiology, Korea University Ansan Hospital, Korea University College of Medicine, 123, Jeokgeum-ro, Danwon-gu, Ansan-si, Gyeonggi-do, Republic of Korea (S.K.Y.).

PMID: 39658474
DOI: 10.1016/j.acra.2024.10.050

Abstract

Rationale and objectives: We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.

Materials and methods: This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), "benign", "no tumor description," and "other malignancy." The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.

Results: In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).

Conclusion: This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.

Keywords: Artificial Intelligence; Large Language model; Multidetector Computed Tomography; Oncology; Radiology Report.