Introduction: The rise in popularity of chatbots, particularly ChatGPT by OpenAI among the general public and its utility in the healthcare field is a topic of present controversy. The current study aimed at assessing the reliability and accuracy of ChatGPT's responses to inquiries posed by parents, specifically focusing on a range of pediatric ophthalmological and strabismus conditions.
Methods: Patient queries were collected via a thematic analysis and posed to ChatGPT 3.5 version across 3 unique instances each. The questions were divided into 12 domains totalling 817 unique questions. All responses were scored on the response quality by two experienced pediatric ophthalmologists in a Likert-scale format. All questions were evaluated for readability using the Flesch-Kincaid Grade Level (FKGL) and character counts.
Results: A total of 638 (78.09%) questions were scored to be perfectly correct, 156 (19.09%) were scored correct but incomplete and only 23 (2.81%) were scored to be partially incorrect. None of the responses were scored to be completely incorrect. Average FKGL score was 14.49 [95% CI 14.4004-14.5854] and the average character count was 1825.33 [95%CI 1791.95-1858.7] with p = 0.831 and 0.697 respectively. The minimum and maximum FKGL scores were 10.6 and 18.34 respectively. FKGL predicted character count, R²=.012, F(1,815) = 10.26, p = .001.
Conclusion: ChatGPT provided accurate and reliable information for a majority of the questions. The readability of the questions was much above the typically required standards for adults, which is concerning. Despite these limitations, it is evident that this technology will play a significant role in the healthcare industry.
Keywords: Pediatric ophthalmology; artificial intelligence; chatbot; health information; strabismus.