Evaluating ChatGPT as a Patient Education Tool for COVID-19-Induced Olfactory Dysfunction

Elliott M Sina; Daniel J Campbell; Alexander Duffy; Shreya Mandloi; Peter Benedict; Douglas Farquhar; Aykut Unsal; Gurston Nyquist

doi:10.1002/oto2.70011

Evaluating ChatGPT as a Patient Education Tool for COVID-19-Induced Olfactory Dysfunction

OTO Open. 2024 Sep 15;8(3):e70011. doi: 10.1002/oto2.70011. eCollection 2024 Jul-Sep.

Authors

Elliott M Sina¹, Daniel J Campbell², Alexander Duffy², Shreya Mandloi², Peter Benedict², Douglas Farquhar², Aykut Unsal², Gurston Nyquist²

Affiliations

¹ Sidney Kimmel Medical College Thomas Jefferson University Philadelphia Pennsylvania USA.
² Department of Otolaryngology Thomas Jefferson University Hospital Philadelphia Pennsylvania USA.

Abstract

Objective: While most patients with COVID-19-induced olfactory dysfunction (OD) recover spontaneously, those with persistent OD face significant physical and psychological sequelae. ChatGPT, an artificial intelligence chatbot, has grown as a tool for patient education. This study seeks to evaluate the quality of ChatGPT-generated responses for COVID-19 OD.

Study design: Quantitative observational study.

Setting: Publicly available online website.

Methods: ChatGPT (GPT-4) was queried 4 times with 30 identical questions. Prior to questioning, Chat-GPT was "prompted" to respond (1) to a patient, (2) to an eighth grader, (3) with references, and (4) no prompt. Answer accuracy was independently scored by 4 rhinologists using the Global Quality Score (GCS, range: 1-5). Proportions of responses at incremental score thresholds were compared using χ ² analysis. Flesch-Kincaid grade level was calculated for each answer. Relationship between prompt type and grade level was assessed via analysis of variance.

Results: Across all graded responses (n = 480), 364 responses (75.8%) were "at least good" (GCS ≥ 4). Proportions of responses that were "at least good" (P < .0001) or "excellent" (GCS = 5) (P < .0001) differed by prompt; "at least moderate" (GCS ≥ 3) responses did not (P = .687). Eighth-grade level (14.06 ± 2.3) and patient-friendly (14.33 ± 2.0) responses were significantly lower mean grade level than no prompting (P < .0001).

Conclusion: ChatGPT provides appropriate answers to most questions on COVID-19 OD regardless of prompting. However, prompting influences response quality and grade level. ChatGPT responds at grade levels above accepted recommendations for presenting medical information to patients. Currently, ChatGPT offers significant potential for patient education as an adjunct to the conventional patient-physician relationship.

Keywords: AI hallucination; COVID‐19; ChatGPT; Flesch‐Kincaid grade level; anosmia; artificial intelligence; chatbot; olfactory dysfunction; patient education; prompting.