Evaluating the capability of ChatGPT in predicting drug-drug interactions: Real-world evidence using hospitalized patient data

Br J Clin Pharmacol. 2024 Dec;90(12):3361-3366. doi: 10.1111/bcp.16275. Epub 2024 Oct 2.

Abstract

Drug-drug interactions (DDIs) present a significant health burden, compounded by clinician time constraints and poor patient health literacy. We assessed the ability of ChatGPT (generative artificial intelligence-based large language model) to predict DDIs in a real-world setting. Demographics, diagnoses and prescribed medicines for 120 hospitalized patients were input through three standardized prompts to ChatGPT version 3.5 and compared against pharmacist DDI evaluation to estimate diagnostic accuracy. Area under receiver operating characteristic and inter-rater reliability (Cohen's and Fleiss' kappa coefficients) were calculated. ChatGPT's responses differed based on prompt wording style, with higher sensitivity for prompts mentioning 'drug interaction'. Confusion matrices displayed low true positive and high true negative rates, and there was minimal agreement between ChatGPT and pharmacists (Cohen's kappa values 0.077-0.143). Low sensitivity values suggest a lack of success in identifying DDIs by ChatGPT, and further development is required before it can reliably assess potential DDIs in real-world scenarios.

Keywords: ChatGPT; artificial intelligence (AI); drug–drug interaction (DDI); large language model (LLM); patient health literacy.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Artificial Intelligence
  • Drug Interactions*
  • Female
  • Hospitalization / statistics & numerical data
  • Humans
  • Male
  • Middle Aged
  • Pharmacists / organization & administration
  • ROC Curve
  • Reproducibility of Results