Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot

Ryan J Davis; Oluwatobiloba Ayo-Ajibola; Matthew E Lin; Mark S Swanson; Tamara N Chambers; Daniel I Kwon; Niels C Kokot

doi:10.1002/lary.31191

Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot

Laryngoscope. 2024 May;134(5):2252-2257. doi: 10.1002/lary.31191. Epub 2023 Nov 20.

Authors

Ryan J Davis¹, Oluwatobiloba Ayo-Ajibola¹, Matthew E Lin¹, Mark S Swanson², Tamara N Chambers², Daniel I Kwon², Niels C Kokot²

Affiliations

¹ Keck School of Medicine of the University of Southern California, Los Angeles, California, USA.
² Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA.

PMID: 37983846
DOI: 10.1002/lary.31191

Abstract

Objective: With burgeoning popularity of artificial intelligence-based chatbots, oropharyngeal cancer patients now have access to a novel source of medical information. Because chatbot information is not reviewed by experts, we sought to evaluate an artificial intelligence-based chatbot's oropharyngeal cancer-related information for accuracy.

Methods: Fifteen oropharyngeal cancer-related questions were developed and input into ChatGPT version 3.5. Four physician-graders independently assessed accuracy, comprehensiveness, and similarity to a physician response using 5-point Likert scales. Responses graded lower than three were then critiqued by physician-graders. Critiques were analyzed using inductive thematic analysis. Readability of responses was assessed using Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKRGL) scales.

Results: Average accuracy, comprehensiveness, and similarity to a physician response scores were 3.88 (SD = 0.99), 3.80 (SD = 1.14), and 3.67 (SD = 1.08), respectively. Posttreatment-related questions were most accurate, comprehensive, and similar to a physician response, followed by treatment-related, then diagnosis-related questions. Posttreatment-related questions scored significantly higher than diagnosis-related questions in all three domains (p < 0.01). Two themes of the physician critiques were identified: suboptimal education value and potential to misinform patients. The mean FRE and FKRGL scores both indicated greater than an 11th grade readability level-higher than the 6th grade level recommended for patients.

Conclusion: ChatGPT responses may not educate patients to an appropriate degree, could outright misinform them, and read at a more difficult grade level than is recommended for patient material. As oropharyngeal cancer patients represent a vulnerable population facing complex, life-altering diagnoses, and treatments, they should be cautious when consuming chatbot-generated medical information.

Level of evidence: NA Laryngoscope, 134:2252-2257, 2024.

Keywords: ChatGPT; artificial intelligence; communication; head and neck cancer.

MeSH terms

Artificial Intelligence
Educational Status
Humans
Laryngoscopes*
Oropharyngeal Neoplasms* / diagnosis
Oropharyngeal Neoplasms* / therapy
Software