Artificial Intelligence-Generated Draft Replies to Patient Inbox Messages

Patricia Garcia; Stephen P Ma; Shreya Shah; Margaret Smith; Yejin Jeong; Anna Devon-Sand; Ming Tai-Seale; Kevin Takazawa; Danyelle Clutter; Kyle Vogt; Carlene Lugtu; Matthew Rojo; Steven Lin; Tait Shanafelt; Michael A Pfeffer; Christopher Sharp

doi:10.1001/jamanetworkopen.2024.3201

Artificial Intelligence-Generated Draft Replies to Patient Inbox Messages

JAMA Netw Open. 2024 Mar 4;7(3):e243201. doi: 10.1001/jamanetworkopen.2024.3201.

Authors

Patricia Garcia¹, Stephen P Ma¹, Shreya Shah^{1

2}, Margaret Smith², Yejin Jeong², Anna Devon-Sand², Ming Tai-Seale³, Kevin Takazawa⁴, Danyelle Clutter⁴, Kyle Vogt⁴, Carlene Lugtu⁵, Matthew Rojo⁴, Steven Lin^{1

2}, Tait Shanafelt^{1

6}, Michael A Pfeffer^{1

4}, Christopher Sharp¹

Affiliations

¹ Department of Medicine, Stanford University School of Medicine, Stanford, California.
² Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Stanford University School of Medicine, Stanford, California.
³ Department of Family Medicine, University of California San Diego School of Medicine, La Jolla.
⁴ Technology and Digital Solutions, Stanford Medicine, Stanford, California.
⁵ Nursing Informatics & Innovation, Stanford Healthcare, Stanford, California.
⁶ WellMD Center, Stanford University School of Medicine, Stanford, California.

Abstract

Importance: The emergence and promise of generative artificial intelligence (AI) represent a turning point for health care. Rigorous evaluation of generative AI deployment in clinical practice is needed to inform strategic decision-making.

Objective: To evaluate the implementation of a large language model used to draft responses to patient messages in the electronic inbox.

Design, setting, and participants: A 5-week, prospective, single-group quality improvement study was conducted from July 10 through August 13, 2023, at a single academic medical center (Stanford Health Care). All attending physicians, advanced practice practitioners, clinic nurses, and clinical pharmacists from the Divisions of Primary Care and Gastroenterology and Hepatology were enrolled in the pilot.

Intervention: Draft replies to patient portal messages generated by a Health Insurance Portability and Accountability Act-compliant electronic health record-integrated large language model.

Main outcomes and measures: The primary outcome was AI-generated draft reply utilization as a percentage of total patient message replies. Secondary outcomes included changes in time measures and clinician experience as assessed by survey.

Results: A total of 197 clinicians were enrolled in the pilot; 35 clinicians who were prepilot beta users, out of office, or not tied to a specific ambulatory clinic were excluded, leaving 162 clinicians included in the analysis. The survey analysis cohort consisted of 73 participants (45.1%) who completed both the presurvey and postsurvey. In gastroenterology and hepatology, there were 58 physicians and APPs and 10 nurses. In primary care, there were 83 physicians and APPs, 4 nurses, and 8 clinical pharmacists. The mean AI-generated draft response utilization rate across clinicians was 20%. There was no change in reply action time, write time, or read time between the prepilot and pilot periods. There were statistically significant reductions in the 4-item physician task load score derivative (mean [SD], 61.31 [17.23] presurvey vs 47.26 [17.11] postsurvey; paired difference, -13.87; 95% CI, -17.38 to -9.50; P < .001) and work exhaustion scores (mean [SD], 1.95 [0.79] presurvey vs 1.62 [0.68] postsurvey; paired difference, -0.33; 95% CI, -0.50 to -0.17; P < .001).

Conclusions and relevance: In this quality improvement study of an early implementation of generative AI, there was notable adoption, usability, and improvement in assessments of burden and burnout. There was no improvement in time. Further code-to-bedside testing is needed to guide future development and organizational strategy.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Academic Medical Centers*
Ambulatory Care Facilities
Artificial Intelligence*
Burnout, Psychological
Humans
Prospective Studies
United States