Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques

Mahade Hasan; Farhana Yasmin; Md Mehedi Hassan; Xue Yu; Soniya Yeasmin; Herat Joshi; Sheikh Mohammed Shariful Islam

doi:10.1371/journal.pone.0312914

Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques

PLoS One. 2025 Jan 9;20(1):e0312914. doi: 10.1371/journal.pone.0312914. eCollection 2025.

Authors

Mahade Hasan¹, Farhana Yasmin², Md Mehedi Hassan³, Xue Yu¹, Soniya Yeasmin⁴, Herat Joshi⁵, Sheikh Mohammed Shariful Islam⁶

Affiliations

¹ School of Software, Nanjing University of Information Science and Technology, Nanjing, China.
² Department of Computer Science and Technology, Nanjing University of Information Science and Technology, Nanjing, China.
³ Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh.
⁴ Department of Computer Science and Engineering, North Western University, Khulna, Bangladesh.
⁵ Great River Health Systems, Burlington, IA, United States of America.
⁶ Institute for Physical Activity and Nutrition, Deakin University, Melbourne, VIC, Australia.

PMID: 39787105
DOI: 10.1371/journal.pone.0312914

Abstract

Heart disease remains a leading cause of mortality and morbidity worldwide, necessitating the development of accurate and reliable predictive models to facilitate early detection and intervention. While state of the art work has focused on various machine learning approaches for predicting heart disease, but they could not able to achieve remarkable accuracy. In response to this need, we applied nine machine learning algorithms XGBoost, logistic regression, decision tree, random forest, k-nearest neighbors (KNN), support vector machine (SVM), gaussian naïve bayes (NB gaussian), adaptive boosting, and linear regression to predict heart disease based on a range of physiological indicators. Our approach involved feature selection techniques to identify the most relevant predictors, aimed at refining the models to enhance both performance and interpretability. The models were trained, incorporating processes such as grid search hyperparameter tuning, and cross-validation to minimize overfitting. Additionally, we have developed a novel voting system with feature selection techniques to advance heart disease classification. Furthermore, we have evaluated the models using key performance metrics including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC AUC). Among the models, XGBoost demonstrated exceptional performance, achieving 99% accuracy, precision, F1-Score, 98% recall, and 100% ROC AUC. This study offers a promising approach to early heart disease diagnosis and preventive healthcare.

Copyright: © 2025 Hasan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms
Bayes Theorem
Decision Trees
Humans
Machine Learning*
ROC Curve
Stroke* / diagnosis
Support Vector Machine*
Voting