Generating synthetic clinical text with local large language models to identify misdiagnosed limb fractures in radiology reports

Artif Intell Med. 2025 Jan:159:103027. doi: 10.1016/j.artmed.2024.103027. Epub 2024 Nov 20.

Abstract

Large language models (LLMs) demonstrate impressive capabilities in generating human-like content and have much potential to improve the performance and efficiency of healthcare. An important application of LLMs is to generate synthetic clinical reports that could alleviate the burden of annotating and collecting real-world data in training AI models. Meanwhile, there could be concerns and limitations in using commercial LLMs to handle sensitive clinical data. In this study, we examined the use of open-source LLMs as an alternative to generate synthetic radiology reports to supplement real-world annotated data. We found LLMs hosted locally can achieve similar performance compared to ChatGPT and GPT-4 in augmenting training data for the downstream report classification task of identifying misdiagnosed fractures. We also examined the predictive value of using synthetic reports alone for training downstream models, where our best setting achieved more than 90 % of the performance using real-world data. Overall, our findings show that open-source, local LLMs can be a favourable option for creating synthetic clinical reports for downstream tasks.

Keywords: Emergency department; Large language models; Local LLMs; Natural language processing; Radiology report; Synthetic data.

MeSH terms

  • Artificial Intelligence
  • Diagnostic Errors
  • Electronic Health Records
  • Fractures, Bone* / diagnostic imaging
  • Humans
  • Natural Language Processing*