Ensemble method using real images, metadata and synthetic images for control of class imbalance in classification

Artif Life Robot. 2022;27(4):796-803. doi: 10.1007/s10015-022-00781-8. Epub 2022 Sep 2.

Abstract

Binary classification and anomaly detection face the problem of class imbalance in data sets. The contribution of this paper is to provide an ensemble model that improves image binary classification by reducing the class imbalance between the minority and majority classes in a data set. The ensemble model is a classifier of real images, synthetic images, and metadata associated with the real images. First, we apply a generative model to synthesize images of the minority class from the real image data set. Secondly, we train the ensemble model jointly with synthesized images of the minority class, real images, and metadata. Finally, we evaluate the model performance using a sensitivity metric to observe the difference in classification resulting from the adjustment of class imbalance. Improving the imbalance of the minority class by adding half the size of the majority class we observe an improvement in the classifier's sensitivity by 12% and 24% for the benchmark pre-trained models of RESNET50 and DENSENet121 respectively.

Keywords: Chest X-rays; Image classification; Image synthesis; Imbalance data; Patient metadata; Pneumonia detection.