Prompt engineering with a large language model to assist providers in responding to patient inquiries: a real-time implementation in the electronic health record

Majid Afshar; Yanjun Gao; Graham Wills; Jason Wang; Matthew M Churpek; Christa J Westenberger; David T Kunstman; Joel E Gordon; Cherodeep Goswami; Frank J Liao; Brian Patterson

doi:10.1093/jamiaopen/ooae080

Prompt engineering with a large language model to assist providers in responding to patient inquiries: a real-time implementation in the electronic health record

JAMIA Open. 2024 Aug 20;7(3):ooae080. doi: 10.1093/jamiaopen/ooae080. eCollection 2024 Oct.

Authors

Majid Afshar^{1

2}, Yanjun Gao¹, Graham Wills², Jason Wang¹, Matthew M Churpek¹, Christa J Westenberger², David T Kunstman^{2

3}, Joel E Gordon^{2

3}, Cherodeep Goswami², Frank J Liao^{2

4}, Brian Patterson^{2

4}

Affiliations

¹ Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI 53792, United States.
² Information Systems and Informatics, University of Wisconsin Health System, Madison, WI 53792, United States.
³ Department of Family Medicine and Community Health, University of Wisconsin School of Medicine and Public Health, Madison, WI 53792, United States.
⁴ Department of Emergency Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI 53792, United States.

Abstract

Background: Large language models (LLMs) can assist providers in drafting responses to patient inquiries. We examined a prompt engineering strategy to draft responses for providers in the electronic health record. The aim was to evaluate the change in usability after prompt engineering.

Materials and methods: A pre-post study over 8 months was conducted across 27 providers. The primary outcome was the provider use of LLM-generated messages from Generative Pre-Trained Transformer 4 (GPT-4) in a mixed-effects model, and the secondary outcome was provider sentiment analysis.

Results: Of the 7605 messages generated, 17.5% (n = 1327) were used. There was a reduction in negative sentiment with an odds ratio of 0.43 (95% CI, 0.36-0.52), but message use decreased (P < .01). The addition of nurses after the study period led to an increase in message use to 35.8% (P < .01).

Discussion: The improvement in sentiment with prompt engineering suggests better content quality, but the initial decrease in usage highlights the need for integration with human factors design.

Conclusion: Future studies should explore strategies for optimizing the integration of LLMs into the provider workflow to maximize both usability and effectiveness.

Keywords: artificial intelligence; electronic health record; large language models; prompt engineering; sentiment analysis.