Generating Synthetic Healthcare Dialogues in Emergency Medicine Using Large Language Models

Stud Health Technol Inform. 2024 Nov 22:321:235-239. doi: 10.3233/SHTI241099.

Abstract

Natural Language Processing (NLP) has shown promise in fields like radiology for converting unstructured into structured data, but acquiring suitable datasets poses several challenges, including privacy concerns. Specifically, we aim to utilize Large Language Models (LLMs) to extract medical information from dialogues between ambulance staff and patients to populate emergency protocol forms. However, we currently lack dialogues with known content that can serve as a gold standard for an evaluation. We designed a pipeline using the quantized LLM "Zephyr-7b-beta" for initial dialogue generation, followed by refinement and translation using OpenAI's GPT-4 Turbo. The MIMIC-IV database provided relevant medical data. The evaluation involved accuracy assessment via Retrieval-Augmented Generation (RAG) and sentiment analysis using multilingual models. Initial results showed a high accuracy of 94% with "Zephyr-7b-beta," slightly decreasing to 87% after refinement with GPT-4 Turbo. Sentiment analysis indicated a qualitative shift towards more positive sentiment post-refinement. These findings highlight the potential and challenges of using LLMs for generating synthetic medical dialogues, informing future NLP system development in healthcare.

Keywords: Emergency medical services; Large Language Model (LLM); Natural Language Processing; Retrieval-Augmented Generation (RAG); Sentiment analysis; Synthetic data generation.

MeSH terms

  • Electronic Health Records
  • Emergency Medicine*
  • Humans
  • Natural Language Processing*