Background: Over the last few decades: heart disease (HD) has emerged as one of the deadliest diseases in the world. Approximately more than 31 % of the population dies from HD each year. The Diagnosis of HD in an earlier stage is a cognitively challenging task due to the vast and complex availability of medical datasets. Many tests are available for the diagnosis of HD, such as ECG, etc.; but the proper diagnosis of the disease is still a great challenge.
Methods: Motivated by existing challenges and the significance of HD, the authors developed a novel hybrid XGBoost Classifier framework for HD prediction that incorporates outlier removal and optimized hyperparameter tuning. In this approach, outliers were handled using z-score and interquartile range (IQR) methods, and hyperparameters were optimized using the "Optuna" framework. Additionally, the impact of different train-test ratios (70,30, 80:20, and 90:10) on model performance was evaluated using Cleveland HD dataset, both with and without outliers.
Results: The proposed hybrid model achieved the finest performance metrics without outliers on a 90:10 train-test ratio with an accuracy of 95.45 %, sensitivity of 92.86 %, precision of 100 %, specificity of 100 %, f1-score 96.3 %, training time 0.8 × 10-16 s and testing time 0.1 × 10-17 s. It was validated by Stratify K-Fold Cross-Validation.
Conclusions: This study highlights the importance of data preprocessing, appropriate train-test ratios, and hyperparameter optimization in HD prediction. The proposed framework provides a promising solution for accurate and efficient HD diagnosis, offering potential benefits for cardiac patient healthcare and decision-making.
Keywords: Heart disease; Hyperparameter tuning; Machine learning; Optuna; XGBoost classifier.
Copyright © 2024 Elsevier B.V. All rights reserved.