Challenges and limitations of synthetic minority oversampling techniques in machine learning

Ibraheem M Alkhawaldeh; Ibrahem Albalkhi; Abdulqadir Jeprel Naswhan

doi:10.5662/wjm.v13.i5.373

Challenges and limitations of synthetic minority oversampling techniques in machine learning

World J Methodol. 2023 Dec 20;13(5):373-378. doi: 10.5662/wjm.v13.i5.373.

Authors

Ibraheem M Alkhawaldeh¹, Ibrahem Albalkhi², Abdulqadir Jeprel Naswhan³

Affiliations

¹ Faculty of Medicine, Mutah University, Karak 61710, Jordan.
² Department of Neuroradiology, Alfaisal University, Great Ormond Street Hospital NHS Foundation Trust, London WC1N 3JH, United Kingdom.
³ Nursing for Education and Practice Development, Hamad Medical Corporation, Doha 3050, Qatar. [email protected].

Abstract

Oversampling is the most utilized approach to deal with class-imbalanced datasets, as seen by the plethora of oversampling methods developed in the last two decades. We argue in the following editorial the issues with oversampling that stem from the possibility of overfitting and the generation of synthetic cases that might not accurately represent the minority class. These limitations should be considered when using oversampling techniques. We also propose several alternate strategies for dealing with imbalanced data, as well as a future work perspective.

Keywords: Class imbalance; Machine learning; Misdiagnosis; Overfitting.

Publication types

Editorial