Assessing the Clinical Appropriateness and Practical Utility of ChatGPT as an Educational Resource for Patients Considering Minimally Invasive Spine Surgery

Advith Sarikonda; Robert Abishek; Emily L Isch; Arbaz A Momin; Mitchell Self; Abhijeet Sambangi; Angeleah Carreras; Jack Jallo; Jim Harrop; Ahilan Sivaganesan

doi:10.7759/cureus.71105

Assessing the Clinical Appropriateness and Practical Utility of ChatGPT as an Educational Resource for Patients Considering Minimally Invasive Spine Surgery

Cureus. 2024 Oct 8;16(10):e71105. doi: 10.7759/cureus.71105. eCollection 2024 Oct.

Authors

Affiliations

¹ Department of Neurological Surgery, Thomas Jefferson University, Philadelphia, USA.
² Department of General Surgery, Division of Plastic Surgery, Thomas Jefferson University Hospital, Philadelphia, USA.
³ Department of Neurological Surgery, Thomas Jefferson University Hospital, Philadelphia, USA.
⁴ Department of Neurosurgery, Thomas Jefferson Medical College, Philadelphia, USA.

Abstract

Introduction Minimally invasive spine surgery (MISS) has evolved over the last three decades as a less invasive alternative to traditional spine surgery, offering benefits such as smaller incisions, faster recovery, and lower complication rates. With patients frequently seeking information about MISS online, the comprehensibility and accuracy of this information are crucial. Recent studies have shown that much of the online material regarding spine surgery exceeds the recommended readability levels, making it difficult for patients to understand. This study explores the clinical appropriateness and readability of responses generated by Chat Generative Pre-Trained Transformer (ChatGPT) to frequently asked questions (FAQs) about MISS. Methods A set of 15 FAQs was formulated based on clinical expertise and existing literature on MISS. Each question was independently inputted into ChatGPT five times, and the generated responses were evaluated by three neurosurgery attendings for clinical appropriateness. Appropriateness was judged based on accuracy, readability, and patient accessibility. Readability was assessed using seven standardized readability tests, including the Flesch-Kincaid Grade Level and Flesch Reading Ease (FRE) scores. Statistical analysis was performed to compare readability scores across preoperative, postoperative, and intraoperative/technical question categories. Results The mean readability scores for preoperative, postoperative, and intraoperative/technical questions were 15±2.8, 16±3, and 15.7±3.2, respectively, significantly exceeding the recommended sixth- to eighth-grade reading level for patient education (p=0.017). Differences in readability across individual questions were also statistically significant (p<0.001). All responses required a reading level above 11th grade, with a majority indicating college-level comprehension. Although preoperative and postoperative questions generally elicited clinically appropriate responses, 50% of intraoperative/technical questions yielded either "inappropriate" or "unreliable" responses, particularly for inquiries about radiation exposure and the use of lasers in MISS. Conclusions While ChatGPT is proficient in providing clinically appropriate responses to certain FAQs about MISS, it frequently produces responses that exceed the recommended readability level for patient education. This limitation suggests that its utility may be confined to highly educated patients, potentially exacerbating existing disparities in patient comprehension. Future AI-based patient education tools must prioritize clear and accessible communication, with oversight from medical professionals to ensure accuracy and appropriateness. Further research comparing ChatGPT's performance with other AI models could enhance its application in patient education across medical specialties.

Keywords: ai; chatgpt; minimally invasive spine surgery; patient education; readability.