Can generative AI improve the readability of patient education materials at a radiology practice?

M Gupta; P Gupta; C Ho; J Wood; S Guleria; J Virostko

doi:10.1016/j.crad.2024.08.019

Can generative AI improve the readability of patient education materials at a radiology practice?

Clin Radiol. 2024 Nov;79(11):e1366-e1371. doi: 10.1016/j.crad.2024.08.019. Epub 2024 Aug 22.

Authors

M Gupta¹, P Gupta², C Ho³, J Wood³, S Guleria³, J Virostko⁴

Affiliations

¹ The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA. Electronic address: [email protected].
² The University of Texas at Austin, Austin, TX, USA.
³ The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA.
⁴ The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA; The University of Texas at Austin, Dell Medical School, Livestrong Cancer Institutes, USA; The University of Texas at Austin, Dell Medical School, Department of Oncology, USA; The University of Texas at Austin, Oden Institute for Computational Engineering and Sciences, USA.

PMID: 39266371
DOI: 10.1016/j.crad.2024.08.019

Abstract

Aim: This study evaluated the readability of existing patient education materials and explored the potential of generative AI tools, such as ChatGPT-4 and Google Gemini, to simplify these materials to a sixth-grade reading level, in accordance with guidelines.

Materials and methods: Seven patient education documents were selected from a major radiology group. ChatGPT-4 and Gemini were provided the documents and asked to reformulate to target a sixth-grade reading level. Average reading level (ARL) and proportional word count (PWC) change were calculated, and a 1-sample t-test was conducted (p=0.05). Three radiologists assessed the materials on a Likert scale for appropriateness, relevance, clarity, and information retention.

Results: The original materials had an ARL of 11.72. ChatGPT ARL was 7.32 ± 0.76 (6/7 significant) and Gemini ARL was 6.55 ± 0.51 (7/7 significant). ChatGPT reduced word count by 15% ± 7%, with 95% retaining at least 75% of information. Gemini reduced word count by 33% ± 7%, with 68% retaining at least 75% of information. ChatGPT outputs were more appropriate (95% vs. 57%), clear (92% vs. 67%), and relevant (95% vs. 76%) than Gemini. Interrater agreement was significantly different for ChatGPT (0.91) than for Gemini (0.46).

Conclusion: Generative AI significantly enhances the readability of patient education materials, which did not achieve the recommended sixth-grade ARL. Radiologist evaluations confirmed the appropriateness and relevance of the AI-simplified texts. This study emphasizes the capabilities of generative AI tools and the necessity for ongoing expert review to maintain content accuracy and suitability.

MeSH terms

Artificial Intelligence
Comprehension*
Health Literacy
Humans
Patient Education as Topic* / methods
Radiology* / education
Teaching Materials / standards