ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

Pac Symp Biocomput. 2025:30:70-81.

Abstract

Accurately interpreting medical images and writing radiology reports is a critical but challenging task in healthcare. Both human-written and AI-generated reports can contain errors, ranging from clinical inaccuracies to linguistic mistakes. To address this, we introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. Working with board-certified radiologists, we developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility. ReXErr demonstrates consistency across error categories and produces errors that closely mimic those found in real-world scenarios. This method has the potential to aid in the development and evaluation of report correction algorithms, potentially enhancing the quality and reliability of radiology reporting.

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Computational Biology*
  • Diagnostic Errors* / statistics & numerical data
  • Humans
  • Natural Language Processing
  • Radiography, Thoracic / standards
  • Radiography, Thoracic / statistics & numerical data
  • Radiology Information Systems / statistics & numerical data