Acceptability and readability of ChatGPT-4 based responses for frequently asked questions about strabismus and amblyopia

S Guven; B Ayyildiz

doi:10.1016/j.jfo.2024.104400

Acceptability and readability of ChatGPT-4 based responses for frequently asked questions about strabismus and amblyopia

J Fr Ophtalmol. 2024 Dec 20;48(3):104400. doi: 10.1016/j.jfo.2024.104400. Online ahead of print.

Authors

S Guven¹, B Ayyildiz²

Affiliations

¹ Kayseri City Hospital, Department of Ophthalmology, Kayseri, Turkey. Electronic address: [email protected].
² Kayseri City Hospital, Department of Ophthalmology, Kayseri, Turkey.

PMID: 39708624
DOI: 10.1016/j.jfo.2024.104400

Abstract

Purpose: To evaluate the compatibility and readability of ChatGPT-4 in providing responses to common inquiries about strabismus and amblyopia.

Materials and methods: A series of commonly asked questions were compiled, covering topics such as the definition, prevalence, diagnostic approaches, surgical and non-surgical treatment alternatives, postoperative guidelines, surgery-related risks, and visual prognosis associated with strabismus and amblyopia. Each question was asked three times on the online ChatGPT-4 platform both in English and French, with data collection conducted on February 18, 2024. The responses generated by ChatGPT-4 were evaluated by two independent pediatric ophthalmologists, who classified them as "acceptable," "unacceptable," or "incomplete." Additionally, an online readability assessment tool called "readable" was utilized for readability analysis.

Results: The majority of responses, totaling 97% of the questions regarding strabismus and amblyopia, consistently met the criteria for acceptability. Only 3% of responses were classified as incomplete, with no instances of unacceptable responses observed. The average Flesch-Kincaid Grade Level and Flesch Reading Ease Score were calculated as 14.53±1.8 and 23.63±8.2, respectively. Furthermore, the means for all readability indices, including the Coleman-Liau index, the Gunning Fog index, and the SMOG index, were found to be 15.75±1.4, 16.96±2.4, and 16.05±1.6, respectively.

Conclusions: ChatGPT-4 consistently produced acceptable responses to the majority of the questions asked (97%). Nevertheless, the readability of these responses proved challenging for the average layperson, requiring a college-level education for comprehension. Further improvements, particularly in terms of readability, are necessary to enhance the advisory capacity of this AI software in providing eye and health-related guidance for patients, physicians, and the general public.

Keywords: Amblyopia; Amblyopie; Artificial intelligence; ChatGPT-4; Intelligence artificielle; Strabisme; Strabismus.