Aims: The aim of this study was to compare the clinical decision-making for benzodiazepine deprescribing between a healthcare provider (HCP) and an artificial intelligence (AI) chatbot GPT4 (ChatGPT-4).
Methods: We analysed real-world data from a Croatian cohort of community-dwelling benzodiazepine patients (n = 154) within the EuroAgeism H2020 ESR 7 project. HCPs evaluated the data using pre-established deprescribing criteria to assess benzodiazepine discontinuation potential. The research team devised and tested AI prompts to ensure consistency with HCP judgements. An independent researcher employed ChatGPT-4 with predetermined prompts to simulate clinical decisions for each patient case. Data derived from human-HCP and ChatGPT-4 decisions were compared for agreement rates and Cohen's kappa.
Results: Both HPC and ChatGPT identified patients for benzodiazepine deprescribing (96.1% and 89.6%, respectively), showing an agreement rate of 95% (κ = .200, P = .012). Agreement on four deprescribing criteria ranged from 74.7% to 91.3% (lack of indication κ = .352, P < .001; prolonged use κ = .088, P = .280; safety concerns κ = .123, P = .006; incorrect dosage κ = .264, P = .001). Important limitations of GPT-4 responses were identified, including 22.1% ambiguous outputs, generic answers and inaccuracies, posing inappropriate decision-making risks.
Conclusions: While AI-HCP agreement is substantial, sole AI reliance poses a risk for unsuitable clinical decision-making. This study's findings reveal both strengths and areas for enhancement of ChatGPT-4 in the deprescribing recommendations within a real-world sample. Our study underscores the need for additional research on chatbot functionality in patient therapy decision-making, further fostering the advancement of AI for optimal performance.
Keywords: ChatGPT-4; artificial intelligence (AI); benzodiazepines; chatbot; deprescribing.
© 2023 British Pharmacological Society.