This study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic brain injury (TBI) rehabilitation-related questions. To assess the effectiveness of multiple agents (GPT-agents) created using GPT-4, a comparison was conducted using direct GPT-4 as the control group (GPT-4). The GPT-agents comprised multiple agents with distinct functions, including "Medical Guideline Classification", "Question Retrieval", "Matching Evaluation", "Intelligent Question Answering (QA)", and "Results Evaluation and Source Citation". Brain rehabilitation questions were selected from the doctor-patient Q&A database for assessment. The primary endpoint was a better answer. The secondary endpoints were accuracy, completeness, explainability, and empathy. Thirty questions were answered; overall GPT-agents took substantially longer and more words to respond than GPT-4 (time: 54.05 vs. 9.66 s, words: 371 vs. 57). However, GPT-agents provided superior answers in more cases compared to GPT-4 (66.7 vs. 33.3%). GPT-Agents surpassed GPT-4 in accuracy evaluation (3.8 ± 1.02 vs. 3.2 ± 0.96, p = 0.0234). No difference in incomplete answers was found (2 ± 0.87 vs. 1.7 ± 0.79, p = 0.213). However, in terms of explainability (2.79 ± 0.45 vs. 07 ± 0.52, p < 0.001) and empathy (2.63 ± 0.57 vs. 1.08 ± 0.51, p < 0.001) evaluation, the GPT-agents performed notably better. Based on medical guidelines, GPT-agents enhanced the accuracy and empathy of responses to TBI rehabilitation questions. This study provides guideline references and demonstrates improved clinical explainability. However, further validation through multicenter trials in a clinical setting is necessary. This study offers practical insights and establishes groundwork for the potential theoretical integration of LLM-agents medicine.
Keywords: Generative pre-trained transformer; Large language model; Medical guidelines; Rehabilitation; Traumatic brain injury.
© 2024. The Author(s).