Collaborative Large Language Models for Automated Data Extraction in Living Systematic Reviews

Muhammad Ali Khan; Umair Ayub; Syed Arsalan Ahmed Naqvi; Kaneez Zahra Rubab Khakwani; Zaryab Bin Riaz Sipra; Ammad Raina; Sihan Zou; Huan He; Seyyed Amir Hossein; Bashar Hasan; R Bryan Rumble; Danielle S Bitterman; Jeremy L Warner; Jia Zou; Amye J Tevaarwerk; Konstantinos Leventakos; Kenneth L Kehl; Jeanne M Palmer; M Hassan Murad; Chitta Baral; Irbaz Bin Riaz

doi:10.1101/2024.09.20.24314108

Collaborative Large Language Models for Automated Data Extraction in Living Systematic Reviews

medRxiv [Preprint]. 2024 Sep 23:2024.09.20.24314108. doi: 10.1101/2024.09.20.24314108.

Authors

Muhammad Ali Khan¹, Umair Ayub¹, Syed Arsalan Ahmed Naqvi¹, Kaneez Zahra Rubab Khakwani², Zaryab Bin Riaz Sipra³, Ammad Raina⁴, Sihan Zou¹, Huan He⁵, Seyyed Amir Hossein^{1

6}, Bashar Hasan⁷, R Bryan Rumble⁸, Danielle S Bitterman⁹, Jeremy L Warner^{10

11

12}, Jia Zou⁶, Amye J Tevaarwerk¹³, Konstantinos Leventakos⁷, Kenneth L Kehl¹⁴, Jeanne M Palmer¹, M Hassan Murad⁷, Chitta Baral⁶, Irbaz Bin Riaz¹

Affiliations

¹ Department of Medicine, Mayo Clinic, Phoenix, United States of America.
² Department of Medicine, University of Arizona, Tucson, United States of America.
³ Department of Medicine and Surgery, Rashid Latif Medical College, Lahore, Pakistan.
⁴ Department of Medicine, Canyon Vista Hospital, Sierra Vista, United States of America.
⁵ Department of Biomedical Informatics and Data Science, Yale University, New Haven, United States of America.
⁶ Department of Computing and Augmented Intelligence, Arizona State University, Tempe, United States of America.
⁷ Department of Medicine, Mayo Clinic, Rochester, United States of America.
⁸ American Society of Clinical Oncology, Alexandria, United States of America.
⁹ Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, United States of America.
¹⁰ Departments of Medicine and Biostatistics, Brown University, Providence, United States of America.
¹¹ Rhode Island Hospital, Providence, United States of America.
¹² Center for Clinical Cancer Informatics and Data Science, Legorreta Cancer Center, Brown University, Providence, United States of America.
¹³ Department of Oncology, Mayo Clinic, Rochester, United States of America.
¹⁴ Department of Medicine, Dana-Farber Cancer Institute, Boston, United States of America.

Abstract

Objective: Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world two-reviewer process.

Materials and methods: A dataset of 10 clinical trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n=5) and held-out test sets (n=17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the two LLMs were compared for concordance. In instances with discordance, original responses from each LLM were provided to the other LLM for cross-critique. Evaluation metrics, including accuracy, were used to assess performance against the manually curated gold standard.

Results: In the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, with an increase in accuracy to 0.76.

Discussion: Concordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy.

Conclusion: Large language models, when simulated in a collaborative, two-reviewer workflow, can extract data with reasonable performance, enabling truly 'living' systematic reviews.

Keywords: Data Extraction; Large Language Models; Meta-Analysis; Natural Language Processing; Systematic Review.

Publication types

Preprint

Grants and funding

U24 CA265879/CA/NCI NIH HHS/United States