A Cross-Sectional Study Comparing Patient Information Guides Generated by ChatGPT and Google Gemini for Common Radiological Procedures

Vidith Phillips; Nidhi L Rao; Yashasvi H Sanghvi; Maryam Nizam

doi:10.7759/cureus.74876

A Cross-Sectional Study Comparing Patient Information Guides Generated by ChatGPT and Google Gemini for Common Radiological Procedures

Cureus. 2024 Nov 30;16(11):e74876. doi: 10.7759/cureus.74876. eCollection 2024 Nov.

Authors

Vidith Phillips¹, Nidhi L Rao², Yashasvi H Sanghvi³, Maryam Nizam⁴

Affiliations

¹ Internal Medicine, Division of Biomedical Informatics and Data Science, Johns Hopkins University, School of Medicine, Baltimore, USA.
² Internal Medicine, KAP Viswanatham Government Medical College, Tiruchirappalli, IND.
³ Internal Medicine, BJ Medical College, Ahmedabad, IND.
⁴ Emergency Medicine, Valaichchenai Base Hospital, Valaichchenai, LKA.

Abstract

Introduction: Artificial intelligence (AI) plays a significant role in creating brochures on radiological procedures for patient education. Thus, this study aimed to evaluate the responses generated by ChatGPT (San Francisco, CA: OpenAI) and Google Gemini (Mountain View, CA: Google LLC) on abdominal ultrasound, abdominal CT scan, and abdominal MRI.

Methodology: A cross-sectional original research was conducted over one week in June 2024 to evaluate the quality of patient information brochures produced by ChatGPT 3.5 and Google Gemini 1.5 Pro. The study assessed variables including word count, sentence count, average words per sentence, average syllables per sentence, grade level, and ease score using the Flesch-Kincaid calculator. Similarity percentage was evaluated using Quillbot (Chicago, IL: Quillbot Inc.), and reliability was measured using the modified DISCERN score. Statistical analysis was conducted using R version 4.3.2 (Vienna, Austria: R Foundation for Statistical Computing).

Results: There is no significant difference between sentence count (p=0.8884), average words per sentence (p=0.1984), average syllables per sentence (p=0.3868), ease (p=0.1812), similarity percentage (p=0.8110), and reliability score (p=0.6495). However, there is a statistically significant difference, with ChatGPT having a higher word count (p=0.0409) and grade level (p=0.0482) than Google Gemini. P-values <0.05 were considered significant.

Conclusions: Both ChatGPT and Google Gemini demonstrate the ability to generate content that maintains consistency assessed through readability and reliability scores. Nevertheless, the noticeable disparities in word count and grade level underscore a crucial area for improvement in customizing content to accommodate varying levels of patient literacy.

Keywords: abdominal ct; abdominal ultrasound; artificial intelligence; chatgpt; educational tool; google gemini; mri of abdomen; patient education brochure.