Evaluation of the Usability of ChatGPT-4 and Google Gemini in Patient Education About Rhinosinusitis

Çağrı Becerik; Selçuk Yıldız; Çiğdem Tepe Karaca; Sema Zer Toros

doi:10.1111/coa.14273

Evaluation of the Usability of ChatGPT-4 and Google Gemini in Patient Education About Rhinosinusitis

Clin Otolaryngol. 2025 Jan 7. doi: 10.1111/coa.14273. Online ahead of print.

Authors

Çağrı Becerik¹, Selçuk Yıldız², Çiğdem Tepe Karaca², Sema Zer Toros²

Affiliations

¹ Department of Otorhinolaryngology, Kemalpaşa State Hospital, İzmir, Turkey.
² University of Health Sciences, Haydarpaşa Numune Research and Training Hospital, Department of Otorhinolaryngology, İstanbul, Turkey.

PMID: 39776223
DOI: 10.1111/coa.14273

Abstract

Introduction: Artificial intelligence (AI) based chat robots are increasingly used by users for patient education about common diseases in the health field, as in every field. This study aims to evaluate and compare patient education materials on rhinosinusitis created by two frequently used chat robots, ChatGPT-4 and Google Gemini.

Method: One hundred nine questions taken from patient information websites were divided into 4 different categories: general knowledge, diagnosis, treatment, surgery and complications, then asked to chat robots. The answers given were evaluated by two different expert otolaryngologists, and on questions where the scores were different, a third, more experienced otolaryngologist finalised the evaluation. Questions were scored from 1 to 4: (1) comprehensive/correct, (2) incomplete/partially correct, (3) accurate and inaccurate data, potentially misleading and (4) completely inaccurate/irrelevant.

Results: In evaluating the answers given by ChatGPT-4, all answers in the Diagnosis category were evaluated as comprehensive/correct. In the evaluation of the answers given by Google Gemini, the answers evaluated as completely inaccurate/irrelevant in the treatment category were found to be statistically significantly higher, and the answers evaluated as incomplete/partially correct in the surgery and complications category were found to be statistically significantly higher. In the comparison between the two chat robots, in the treatment category, ChatGPT-4 had a higher correct evaluation rate than Google Gemini and was found to be statistically significant.

Conclusion: The answers given by ChatGPT-4 and Google Gemini chat robots regarding rhinosinusitis were evaluated as sufficient and informative.

Keywords: ChatGPT‐4; Google Gemini; artificial intelligence; rhinosinusitis.