Completeness and readability of GPT-4-generated multilingual discharge instructions in the pediatric emergency department

JAMIA Open. 2024 Jul 1;7(3):ooae050. doi: 10.1093/jamiaopen/ooae050. eCollection 2024 Oct.

Abstract

Objectives: The aim of this study was to assess the completeness and readability of generative pre-trained transformer-4 (GPT-4)-generated discharge instructions at prespecified reading levels for common pediatric emergency room complaints.

Materials and methods: The outputs for 6 discharge scenarios stratified by reading level (fifth or eighth grade) and language (English, Spanish) were generated fivefold using GPT-4. Specifically, 120 discharge instructions were produced and analyzed (6 scenarios: 60 in English, 60 in Spanish; 60 at a fifth-grade reading level, 60 at an eighth-grade reading level) and compared for completeness and readability (between language, between reading level, and stratified by group and reading level). Completeness was defined as the proportion of literature-derived key points included in discharge instructions. Readability was quantified using Flesch-Kincaid (English) and Fernandez-Huerta (Spanish) readability scores.

Results: English-language GPT-generated discharge instructions contained a significantly higher proportion of must-include discharge instructions than those in Spanish (English: mean (standard error of the mean) = 62% (3%), Spanish: 53% (3%), P = .02). In the fifth-grade and eighth-grade level conditions, there was no significant difference between English and Spanish outputs in completeness. Readability did not differ across languages.

Discussion: GPT-4 produced readable discharge instructions in English and Spanish while modulating document reading level. Discharge instructions in English tended to have higher completeness than those in Spanish.

Conclusion: Future research in prompt engineering and GPT-4 performance, both generally and in multiple languages, is needed to reduce potential for health disparities by language and reading level.

Keywords: artificial intelligence; computer simulations; diversity; equity; inclusion; literacy; pediatric emergency medicine.