Pediatric respiratory disease diagnosis and subsequent treatment require accurate and interpretable analysis. A chest X-ray is the most cost-effective and rapid method for identifying and monitoring various thoracic diseases in children. Recent developments in self-supervised and transfer learning have shown their potential in medical imaging, including chest X-ray areas. In this article, we propose a three-stage framework with knowledge transfer from adult chest X-rays to aid the diagnosis and interpretation of pediatric thorax diseases. We conducted comprehensive experiments with different pre-training and fine-tuning strategies to develop transformer or convolutional neural network models and then evaluate them qualitatively and quantitatively. The ViT-Base/16 model, fine-tuned with the CheXpert dataset, a large chest X-ray dataset, emerged as the most effective, achieving a mean AUC of 0.761 (95% CI: 0.759-0.763) across six disease categories and demonstrating a high sensitivity (average 0.639) and specificity (average 0.683), which are indicative of its strong discriminative ability. The baseline models, ViT-Small/16 and ViT-Base/16, when directly trained on the Pediatric CXR dataset, only achieved mean AUC scores of 0.646 (95% CI: 0.641-0.651) and 0.654 (95% CI: 0.648-0.660), respectively. Qualitatively, our model excels in localizing diseased regions, outperforming models pre-trained on ImageNet and other fine-tuning approaches, thus providing superior explanations. The source code is available online and the data can be obtained from PhysioNet.
Keywords: chest X-ray; medical image analysis; model interpretability; self-supervised learning; transfer learning.