A hybridization of XGBoost machine learning model by Optuna hyperparameter tuning suite for cardiovascular disease classification with significant effect of outliers and heterogeneous training datasets

Int J Cardiol. 2025 Feb 1:420:132757. doi: 10.1016/j.ijcard.2024.132757. Epub 2024 Nov 28.

Abstract

Background: Over the last few decades: heart disease (HD) has emerged as one of the deadliest diseases in the world. Approximately more than 31 % of the population dies from HD each year. The Diagnosis of HD in an earlier stage is a cognitively challenging task due to the vast and complex availability of medical datasets. Many tests are available for the diagnosis of HD, such as ECG, etc.; but the proper diagnosis of the disease is still a great challenge.

Methods: Motivated by existing challenges and the significance of HD, the authors developed a novel hybrid XGBoost Classifier framework for HD prediction that incorporates outlier removal and optimized hyperparameter tuning. In this approach, outliers were handled using z-score and interquartile range (IQR) methods, and hyperparameters were optimized using the "Optuna" framework. Additionally, the impact of different train-test ratios (70,30, 80:20, and 90:10) on model performance was evaluated using Cleveland HD dataset, both with and without outliers.

Results: The proposed hybrid model achieved the finest performance metrics without outliers on a 90:10 train-test ratio with an accuracy of 95.45 %, sensitivity of 92.86 %, precision of 100 %, specificity of 100 %, f1-score 96.3 %, training time 0.8 × 10-16 s and testing time 0.1 × 10-17 s. It was validated by Stratify K-Fold Cross-Validation.

Conclusions: This study highlights the importance of data preprocessing, appropriate train-test ratios, and hyperparameter optimization in HD prediction. The proposed framework provides a promising solution for accurate and efficient HD diagnosis, offering potential benefits for cardiac patient healthcare and decision-making.

Keywords: Heart disease; Hyperparameter tuning; Machine learning; Optuna; XGBoost classifier.

MeSH terms

  • Cardiovascular Diseases* / diagnosis
  • Databases, Factual
  • Datasets as Topic
  • Humans
  • Machine Learning*
  • Male