Evaluation of ChatGPT in Predicting 6-Month Outcomes After Traumatic Brain Injury

Clement Gakuba; Charlene Le Barbey; Alexandre Sar; Gregory Bonnet; Damiano Cerasuolo; Mikhael Giabicani; Jean-Denis Moyer

doi:10.1097/CCM.0000000000006236

Evaluation of ChatGPT in Predicting 6-Month Outcomes After Traumatic Brain Injury

Crit Care Med. 2024 Jun 1;52(6):942-950. doi: 10.1097/CCM.0000000000006236. Epub 2024 Mar 6.

Authors

Clement Gakuba^{1

2}, Charlene Le Barbey¹, Alexandre Sar¹, Gregory Bonnet³, Damiano Cerasuolo^{4

5}, Mikhael Giabicani⁶, Jean-Denis Moyer¹

Affiliations

¹ CHU de Caen Normandie, Department of Anesthesiology and Critical Care Medicine, Caen, France.
² Normandie Univ, UNICAEN, INSERM, U1237, PhIND "Physiopathology and imaging of Neurological Disorders", Institut Blood and Brain @ Caen-Normandie, Cyceron, Caen, France.
³ Normandie Univ, UNICAEN, ENSICAEN, CNRS, Department of Groupe de Recherche en Informatique, Image, et Instrumentation de Caen (GREYC), Caen, France.
⁴ CHU de Caen Normandie, Department of Public Health, Caen, France.
⁵ Normandie Univ, UNICAEN, INSERM U1086, ANTICIPE, Caen, France.
⁶ Department of Anaesthesiology and Critical Care, Beaujon Hospital, DMU Parabol, AP-HP Nord, Paris, France.

PMID: 38445975
DOI: 10.1097/CCM.0000000000006236

Abstract

Objectives: To evaluate the capacity of ChatGPT, a widely accessible and uniquely popular artificial intelligence-based chatbot, in predicting the 6-month outcome following moderate-to-severe traumatic brain injury (TBI).

Design: Single-center observational retrospective study.

Setting: Data are from a neuro-ICU from a level 1 trauma center.

Patients: All TBI patients admitted to ICU between September 2021 and October 2022 were included in a prospective database.

Interventions: None.

Measurements and main results: Based on anonymized clinical, imaging, and biological information available at the patients' hospital admission and extracted from the database, clinical vignettes were retrospectively submitted to ChatGPT for prediction of patients' outcomes. The predictions of two intensivists (one neurointensivist and one non-neurointensivist) both from another level 1 trauma center (Beaujon Hospital), were also collected as was the International Mission on Prognosis and Analysis of Clinical Trials in Traumatic Brain Injury (IMPACT) scoring. Each intensivist, as well as ChatGPT, made their prognostic evaluations independently, without knowledge of the others' predictions and of the patients' actual management and outcome. Both the intensivists and ChatGPT were given access to the exact same set of information. The main outcome was a 6-month-functional status dichotomized into favorable (Glasgow Outcome Scale Extended [GOSE] ≥ 5) versus poor (GOSE < 5). Prediction of intracranial hypertension management, pulmonary infectious risk, and removal of life-sustaining therapies was also investigated as secondary outcomes. Eighty consecutive moderate-to-severe TBI patients were included. For the 6-month outcome prognosis, area under the receiver operating characteristic curve (AUC-ROC) for ChatGPT, the neurointensivist, the non-neurointensivist, and IMPACT were, respectively, 0.62 (0.50-0.74), 0.70 (0.59-0.82), 0.71 (0.59-0.82), and 0.81 (0.72-0.91). ChatGPT had the highest sensitivity (100%), but the lowest specificity (26%). For secondary outcomes, ChatGPT's prognoses were generally less accurate than clinicians' prognoses, with lower AUC values for most outcomes.

Conclusions: This study does not support the use of ChatGPT for prediction of outcomes after TBI.

Publication types

Observational Study

MeSH terms

Adult
Aged
Artificial Intelligence
Brain Injuries, Traumatic* / therapy
Female
Humans
Intensive Care Units / statistics & numerical data
Male
Middle Aged
Prognosis
Retrospective Studies