Emergency Patient Triage Improvement through a Retrieval-Augmented Generation Enhanced Large-Scale Language Model

Megumi Yazaki; Satoshi Maki; Takeo Furuya; Ken Inoue; Ko Nagai; Yuki Nagashima; Juntaro Maruyama; Yasunori Toki; Kyota Kitagawa; Shuhei Iwata; Takaki Kitamura; Sho Gushiken; Yuji Noguchi; Masahiro Inoue; Yasuhiro Shiga; Kazuhide Inage; Sumihisa Orita; Takaaki Nakada; Seiji Ohtori

doi:10.1080/10903127.2024.2374400

Emergency Patient Triage Improvement through a Retrieval-Augmented Generation Enhanced Large-Scale Language Model

Prehosp Emerg Care. 2024 Jul 11:1-7. doi: 10.1080/10903127.2024.2374400. Online ahead of print.

Authors

Megumi Yazaki^{1

2

3}, Satoshi Maki^{1

4}, Takeo Furuya¹, Ken Inoue², Ko Nagai², Yuki Nagashima¹, Juntaro Maruyama¹, Yasunori Toki¹, Kyota Kitagawa¹, Shuhei Iwata¹, Takaki Kitamura¹, Sho Gushiken¹, Yuji Noguchi¹, Masahiro Inoue¹, Yasuhiro Shiga¹, Kazuhide Inage¹, Sumihisa Orita^{1

4}, Takaaki Nakada³, Seiji Ohtori¹

Affiliations

¹ Department of Orthopaedic Surgery, Graduate School of Medicine, Chiba University, Chiba, Japan.
² Tertiary Emergency Medical Center, Tokyo Metropolitan Bokutoh Hospital, Tokyo, Japan.
³ Department of Emergency and Critical Care Medicine, Chiba University, Chiba, Japan.
⁴ Center for Frontier Medical Engineering, Chiba University, Chiba, Japan.

PMID: 38950135
DOI: 10.1080/10903127.2024.2374400

Abstract

Objectives: Emergency medical triage is crucial for prioritizing patient care in emergency situations, yet its effectiveness can vary significantly based on the experience and training of the personnel involved. This study aims to evaluate the efficacy of integrating Retrieval Augmented Generation (RAG) with Large Language Models (LLMs), specifically OpenAI's GPT models, to standardize triage procedures and reduce variability in emergency care.

Methods: We created 100 simulated triage scenarios based on modified cases from the Japanese National Examination for Emergency Medical Technicians. These scenarios were processed by the RAG-enhanced LLMs, and the models were given patient vital signs, symptoms, and observations from emergency medical services (EMS) teams as inputs. The primary outcome was the accuracy of triage classifications, which was used to compare the performance of the RAG-enhanced LLMs with that of emergency medical technicians and emergency physicians. Secondary outcomes included the rates of under-triage and over-triage.

Results: The Generative Pre-trained Transformer 3.5 (GPT-3.5) with RAG model achieved a correct triage rate of 70%, significantly outperforming Emergency Medical Technicians (EMTs) with 35% and 38% correct rates, and emergency physicians with 50% and 47% correct rates (p < 0.05). Additionally, this model demonstrated a substantial reduction in under-triage rates to 8%, compared with 33% for GPT-3.5 without RAG, and 39% for GPT-4 without RAG.

Conclusions: The integration of RAG with LLMs shows promise in improving the accuracy and consistency of medical assessments in emergency settings. Further validation in diverse medical settings with broader datasets is necessary to confirm the effectiveness and adaptability of these technologies in live environments.