Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset

Sci Rep. 2023 Oct 25;13(1):18299. doi: 10.1038/s41598-023-45532-2.

Abstract

Since the beginning of the COVID-19 pandemic, many different machine learning models have been developed to detect and verify COVID-19 pneumonia based on chest X-ray images. Although promising, binary models have only limited implications for medical treatment, whereas the prediction of disease severity suggests more suitable and specific treatment options. In this study, we publish severity scores for the 2358 COVID-19 positive images in the COVIDx8B dataset, creating one of the largest collections of publicly available COVID-19 severity data. Furthermore, we train and evaluate deep learning models on the newly created dataset to provide a first benchmark for the severity classification task. One of the main challenges of this dataset is the skewed class distribution, resulting in undesirable model performance for the most severe cases. We therefore propose and examine different augmentation strategies, specifically targeting majority and minority classes. Our augmentation strategies show significant improvements in precision and recall values for the rare and most severe cases. While the models might not yet fulfill medical requirements, they serve as an appropriate starting point for further research with the proposed dataset to optimize clinical resource allocation and treatment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • COVID-19*
  • Humans
  • Machine Learning
  • Mental Recall
  • Pandemics*