Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets

Front Artif Intell. 2025 Jan 7:7:1499530. doi: 10.3389/frai.2024.1499530. eCollection 2024.

Abstract

Introduction: Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.

Methods: A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy.

Results: Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments.

Discussion: The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.

Keywords: diabetes detection; imbalance handling methods; imbalanced datasets; machine learning; statistical analysis.

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.