Examining the Readability of AtlasGPT, the Premiere Resource for Neurosurgical Education

World Neurosurg. 2024 Dec 6:194:123469. doi: 10.1016/j.wneu.2024.11.052. Online ahead of print.

Abstract

Background: AtlasGPT represents an innovative generative pretrained transformer, trained using neurosurgery literature. Its ability to construct its response according to the training level of the user is unique; however, whether its responses can be comprehended at each user's training level remains unknown. This study aimed to analyze the readability of responses provided by AtlasGPT.

Methods: Ten queries were presented to AtlasGPT across its 4 user profiles (i.e., surgeon, resident, medical student, patient). A readability analysis was performed using multiple instruments on Readability Studio. Readability scores of user-specific responses were compared using one-way analysis of variance testing and post hoc pairwise t-tests with Bonferroni correction. P value <0.05 was considered to be significant.

Results: Across the readability instruments that were leveraged, significant differences in reading ease were observed across all user profiles on comparisons to the patient (P < 0.005). Readability scores for the medical student profile tended to show greater reading ease than the surgeon and resident profiles; these differences, however, were not significant. The mean grade levels for patient responses across multiple instruments ranged from 8.8 to 11.51. Only one output via the New Dale-Chall assessment was written at the level of fifth-sixth grade.

Conclusions: AtlasGPT-generated content demonstrates readability variations according to the user profile selected; however, the readability of patient content still exceeds recommendations set by United States departmental agencies, necessitating a call to action.

Keywords: AtlasGPT; ChatGPT; Education; Health literacy; Neurosurgery; Readability.