Construction of Clinical Predictive Models for Heart Failure Detection Using Six Different Machine Learning Algorithms: Identification of Key Clinical Prognostic Features

Int J Gen Med. 2024 Dec 28:17:6523-6534. doi: 10.2147/IJGM.S493789. eCollection 2024.

Abstract

Purpose: Heart failure (HF) is a clinical syndrome in which structural or functional abnormalities of the heart result in impaired ventricular filling or ejection capacity. In order to improve the adaptability of models to different patient populations and data situations. This study aims to develop predictive models for HF risk using six machine learning algorithms, providing valuable insights into the early assessment and recognition of HF by clinical features.

Patients and methods: The present study focused on clinical characteristics that significantly differed between groups with left ventricular ejection fractions (LVEF) [≤40% and >40%]. Following the elimination of features with significant missing values, the remaining features were utilized to construct predictive models employing six machine learning algorithms. The optimal model was selected based on various performance metrics, including the area under the curve (AUC), accuracy, precision, recall, and F1 score. Utilizing the optimal model, the significance of clinical features was assessed, and those with importance values exceeding 0.8 were identified as crucial to the study. Finally, a correlation analysis was conducted to examine the relationships between these features and other significant clinical features.

Results: The logistic regression (LR) model was determined to be the optimal machine learning algorithm in this study, achieving an accuracy of 0.64, a precision of 0.45, a recall of 0.72, an F1 score of 0.51, and an AUC of 0.81 in the training set and 0.91 in the testing set. In addition, the analysis of feature importance indicated that blood calcium, angiotensin-converting enzyme inhibitors (ACEI) dosage, mean hemoglobin concentration, and survival duration were critical to the study, each possessing importance values exceeding 0.8. Furthermore, correlation analysis revealed a strong relationship between blood calcium and ionized calcium (|cor|=0.99), as well as a significant association between ACEI dosage (|cor|=0.68) and left ventricular metrics (|cor|=0.58); on the other hand, no correlations were observed between mean hemoglobin levels and other clinical characteristics.

Conclusion: The present study identified LR as the most effective risk prediction model for patients with HF, highlighting blood calcium, ACEI dosage, and mean hemoglobin level as significant predictors. These findings provide significant insights for the clinical prevention and early intervention of HF.

Keywords: area under the curve; blood calcium; correlation analysis; left ventricular ejection fractions; logistic regression.