Noise-induced modality-specific pretext learning for pediatric chest X-ray image classification

Sivaramakrishnan Rajaraman; Zhaohui Liang; Zhiyun Xue; Sameer Antani

doi:10.3389/frai.2024.1419638

Noise-induced modality-specific pretext learning for pediatric chest X-ray image classification

Front Artif Intell. 2024 Sep 5:7:1419638. doi: 10.3389/frai.2024.1419638. eCollection 2024.

Authors

Sivaramakrishnan Rajaraman¹, Zhaohui Liang¹, Zhiyun Xue¹, Sameer Antani¹

Affiliation

¹ Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.

Abstract

Introduction: Deep learning (DL) has significantly advanced medical image classification. However, it often relies on transfer learning (TL) from models pretrained on large, generic non-medical image datasets like ImageNet. Conversely, medical images possess unique visual characteristics that such general models may not adequately capture.

Methods: This study examines the effectiveness of modality-specific pretext learning strengthened by image denoising and deblurring in enhancing the classification of pediatric chest X-ray (CXR) images into those exhibiting no findings, i.e., normal lungs, or with cardiopulmonary disease manifestations. Specifically, we use a VGG-16-Sharp-U-Net architecture and leverage its encoder in conjunction with a classification head to distinguish normal from abnormal pediatric CXR findings. We benchmark this performance against the traditional TL approach, viz., the VGG-16 model pretrained only on ImageNet. Measures used for performance evaluation are balanced accuracy, sensitivity, specificity, F-score, Matthew's Correlation Coefficient (MCC), Kappa statistic, and Youden's index.

Results: Our findings reveal that models developed from CXR modality-specific pretext encoders substantially outperform the ImageNet-only pretrained model, viz., Baseline, and achieve significantly higher sensitivity (p < 0.05) with marked improvements in balanced accuracy, F-score, MCC, Kappa statistic, and Youden's index. A novel attention-based fuzzy ensemble of the pretext-learned models further improves performance across these metrics (Balanced accuracy: 0.6376; Sensitivity: 0.4991; F-score: 0.5102; MCC: 0.2783; Kappa: 0.2782, and Youden's index:0.2751), compared to Baseline (Balanced accuracy: 0.5654; Sensitivity: 0.1983; F-score: 0.2977; MCC: 0.1998; Kappa: 0.1599, and Youden's index:0.1327).

Discussion: The superior results of CXR modality-specific pretext learning and their ensemble underscore its potential as a viable alternative to conventional ImageNet pretraining for medical image classification. Results from this study promote further exploration of medical modality-specific TL techniques in the development of DL models for various medical imaging applications.

Keywords: chest radiography; deep learning; ensemble learning; modality-specific knowledge transfer; pediatric; pretext learning; statistical significance.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) at the National Institutes of Health (NIH).