Encoder-decoder models for chest X-ray report generation perform no better than unconditioned baselines

PLoS One. 2021 Nov 29;16(11):e0259639. doi: 10.1371/journal.pone.0259639. eCollection 2021.

Abstract

High quality radiology reporting of chest X-ray images is of core importance for high-quality patient diagnosis and care. Automatically generated reports can assist radiologists by reducing their workload and even may prevent errors. Machine Learning (ML) models for this task take an X-ray image as input and output a sequence of words. In this work, we show that ML models for this task based on the popular encoder-decoder approach, like 'Show, Attend and Tell' (SA&T) have similar or worse performance than models that do not use the input image, called unconditioned baseline. An unconditioned model achieved diagnostic accuracy of 0.91 on the IU chest X-ray dataset, and significantly outperformed SA&T (0.877) and other popular ML models (p-value < 0.001). This unconditioned model also outperformed SA&T and similar ML methods on the BLEU-4 and METEOR metrics. Also, an unconditioned version of SA&T obtained by permuting the reports generated from images of the test set, achieved diagnostic accuracy of 0.862, comparable to that of SA&T (p-value ≥ 0.05).

MeSH terms

  • Algorithms*
  • Tomography, X-Ray Computed*

Grants and funding

The author (s) received no specific funding for this work.