Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset

Healthcare (Basel). 2024 Dec 29;13(1):37. doi: 10.3390/healthcare13010037.

Abstract

Background: Diabetes is a metabolic disorder characterized by increased blood sugar levels. Early detection of diabetes could help individuals to manage and delay the progression of this disorder effectively. Machine learning (ML) methods are important in forecasting the progression and diagnosis of different medical problems with better accuracy. Although they cannot substitute the work of physicians in the prediction and diagnosis of disease, they can be of great help in identifying hidden patterns based on the results and outcome of disease. Methods: In this research, we retrieved the PIMA dataset from the Kaggle repository, the retrieved dataset was further processed for applied PCA, heatmap, and scatter plot for exploratory data analysis (EDA), which helps to find out the relationship between various features in the dataset using visual representation. Four different ML algorithms Random Forest (RF), Decision Tree (DT), Naïve Bayes (NB), and Logistic regression (LR) were implemented on Rattle using Python for the prediction of diabetes among the female population. Results: Results of our study showed that RF performs better in terms of accuracy of 80%, precision of 82%, error rate of 20%, and sensitivity of 88% as compared to other developed models DT, NB, and LR. Conclusions: Diabetes is a common problem prevailing across the globe, ML-based prediction models can help in the prediction of diabetes much earlier before the worsening of the condition.

Keywords: Naïve Bayes; decision tree; diabetes; logistic regression; machine learning; random forest.