Data augmentation based semi-supervised method to improve COVID-19 CT classification

Math Biosci Eng. 2023 Feb 6;20(4):6838-6852. doi: 10.3934/mbe.2023294.

Abstract

The Coronavirus (COVID-19) outbreak of December 2019 has become a serious threat to people around the world, creating a health crisis that infected millions of lives, as well as destroying the global economy. Early detection and diagnosis are essential to prevent further transmission. The detection of COVID-19 computed tomography images is one of the important approaches to rapid diagnosis. Many different branches of deep learning methods have played an important role in this area, including transfer learning, contrastive learning, ensemble strategy, etc. However, these works require a large number of samples of expensive manual labels, so in order to save costs, scholars adopted semi-supervised learning that applies only a few labels to classify COVID-19 CT images. Nevertheless, the existing semi-supervised methods focus primarily on class imbalance and pseudo-label filtering rather than on pseudo-label generation. Accordingly, in this paper, we organized a semi-supervised classification framework based on data augmentation to classify the CT images of COVID-19. We revised the classic teacher-student framework and introduced the popular data augmentation method Mixup, which widened the distribution of high confidence to improve the accuracy of selected pseudo-labels and ultimately obtain a model with better performance. For the COVID-CT dataset, our method makes precision, F1 score, accuracy and specificity 21.04%, 12.95%, 17.13% and 38.29% higher than average values for other methods respectively, For the SARS-COV-2 dataset, these increases were 8.40%, 7.59%, 9.35% and 12.80% respectively. For the Harvard Dataverse dataset, growth was 17.64%, 18.89%, 19.81% and 20.20% respectively. The codes are available at https://github.com/YutingBai99/COVID-19-SSL.

Keywords: COVID-19; Mixup; pseudo-labels; semi-supervised; teacher-student framework.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / diagnostic imaging
  • COVID-19* / epidemiology
  • Databases, Factual
  • Disease Outbreaks
  • Humans
  • SARS-CoV-2
  • Tomography, X-Ray Computed