Evaluating the capability of ChatGPT in predicting drug-drug interactions: Real-world evidence using hospitalized patient data

Ramya Padmavathy Radha Krishnan; Euniss Hinyo Hung; Megan Ashford; Clark Ethan Edillo; Charlise Gardner; Hector Blake Hatrick; Byungjun Kim; Angel Wing Yan Lai; Xinran Li; Yvonne Xinyi Zhao; Jacques Eugene Raubenheimer

doi:10.1111/bcp.16275

Evaluating the capability of ChatGPT in predicting drug-drug interactions: Real-world evidence using hospitalized patient data

Br J Clin Pharmacol. 2024 Dec;90(12):3361-3366. doi: 10.1111/bcp.16275. Epub 2024 Oct 2.

Authors

Affiliations

¹ Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia.
² Faculty of Science, University of Sydney, Sydney, New South Wales, Australia.

Abstract

Drug-drug interactions (DDIs) present a significant health burden, compounded by clinician time constraints and poor patient health literacy. We assessed the ability of ChatGPT (generative artificial intelligence-based large language model) to predict DDIs in a real-world setting. Demographics, diagnoses and prescribed medicines for 120 hospitalized patients were input through three standardized prompts to ChatGPT version 3.5 and compared against pharmacist DDI evaluation to estimate diagnostic accuracy. Area under receiver operating characteristic and inter-rater reliability (Cohen's and Fleiss' kappa coefficients) were calculated. ChatGPT's responses differed based on prompt wording style, with higher sensitivity for prompts mentioning 'drug interaction'. Confusion matrices displayed low true positive and high true negative rates, and there was minimal agreement between ChatGPT and pharmacists (Cohen's kappa values 0.077-0.143). Low sensitivity values suggest a lack of success in identifying DDIs by ChatGPT, and further development is required before it can reliably assess potential DDIs in real-world scenarios.

Keywords: ChatGPT; artificial intelligence (AI); drug–drug interaction (DDI); large language model (LLM); patient health literacy.

MeSH terms

Adult
Aged
Aged, 80 and over
Artificial Intelligence
Drug Interactions*
Female
Hospitalization / statistics & numerical data
Humans
Male
Middle Aged
Pharmacists / organization & administration
ROC Curve
Reproducibility of Results