Leveraging large language models to improve patient education on dry eye disease

Eye (Lond). 2024 Dec 16. doi: 10.1038/s41433-024-03476-5. Online ahead of print.

Abstract

Background/objectives: Dry eye disease (DED) is an exceedingly common diagnosis in patients, yet recent analyses have demonstrated patient education materials (PEMs) on DED to be of low quality and readability. Our study evaluated the utility and performance of three large language models (LLMs) in enhancing and generating new patient education materials (PEMs) on dry eye disease (DED).

Subjects/methods: We evaluated PEMs generated by ChatGPT-3.5, ChatGPT-4, Gemini Advanced, using three separate prompts. Prompts A and B requested they generate PEMs on DED, with Prompt B specifying a 6th-grade reading level, using the SMOG (Simple Measure of Gobbledygook) readability formula. Prompt C asked for a rewrite of existing PEMs at a 6th-grade reading level. Each PEM was assessed on readability (SMOG, FKGL: Flesch-Kincaid Grade Level), quality (PEMAT: Patient Education Materials Assessment Tool, DISCERN), and accuracy (Likert Misinformation scale).

Results: All LLM-generated PEMs in response to Prompt A and B were of high quality (median DISCERN = 4), understandable (PEMAT understandability ≥70%) and accurate (Likert Score=1). LLM-generated PEMs were not actionable (PEMAT Actionability <70%). ChatGPT-4 and Gemini Advanced rewrote existing PEMs (Prompt C) from a baseline readability level (FKGL: 8.0 ± 2.4, SMOG: 7.9 ± 1.7) to targeted 6th-grade reading level; rewrites contained little to no misinformation (median Likert misinformation=1 (range: 1-2)). However, only ChatGPT-4 rewrote PEMs while maintaining high quality and reliability (median DISCERN = 4).

Conclusion: LLMs (notably ChatGPT-4) were able to generate and rewrite PEMs on DED that were readable, accurate, and high quality. Our study underscores the value of leveraging LLMs as supplementary tools to improving PEMs.