Explainable machine learning and online calculators to predict heart failure mortality in intensive care units

ESC Heart Fail. 2024 Sep 19. doi: 10.1002/ehf2.15062. Online ahead of print.

Abstract

Aims: This study aims to develop explainable machine learning models and clinical tools for predicting mortality in patients in the intensive care unit (ICU) with heart failure (HF).

Methods: Patients diagnosed with HF who experienced their first ICU stay lasting between 24 h and 28 days were selected from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The primary outcome was all-cause mortality within 28 days. Data analysis was performed using Python and R, with feature selection conducted via least absolute shrinkage and selection operator (LASSO) regression. Fifteen models were evaluated, and the most effective model was rendered explainable through the Shapley additive explanations (SHAP) approach. A nomogram was developed based on logistic regression to facilitate interpretation. For external validation, the eICU database was utilized.

Results: After selection, the study included 2343 records, with 1808 surviving and 535 deceased patients. The median age of the study population was 70.00, with ~3/5 males (60.31%). The median length of stay in the ICU was 6.00 days. The median age of the survival group was younger than the non-survival group (69.00 vs. 73.00), and non-survival patients spent longer time in the ICU. Seventy-five features were initially selected, including basic information, vital signs, laboratory tests, haemodynamics and oxygen status. LASSO regression determined the shrinkage parameter α = 0.020, and 44 features were chosen for model construction. The linear discriminant analysis (LDA) model showed the best performance, and the accuracy reached 0.8354 in the training cohort and 0.8563 in the testing cohort. It showed satisfying area under the curve (AUC), recall, precision, F1 score, Cohen's kappa score and Matthew's correlation coefficient. The concordance index (c-index) reached 0.7972 in the training cohort and 0.8125 in the testing cohort. In external validation, the LDA model achieved approximately 0.9 in accuracy, precision, recall and F1 score, with an AUC of 0.79. Univariable analysis was performed in the training cohort. Features that differed significantly between the survival and non-survival groups were subjected to multiple logistic regression. The nomogram built on multiple logistic regression included 14 features and demonstrated excellent performance. The AUC of the nomogram is 0.852 in the training cohort, 0.855 in the internal validation cohort and 0.770 in the external validation cohort. The calibration curve showed good consistency.

Conclusions: The study developed an LDA and a nomogram model for predicting mortality in HF patients in the ICU. The SHAP approach was employed to elucidate the LDA model, enhancing its utility for clinicians. These models were made accessible online for clinical application.

Keywords: heart failure; machine learning; nomogram; predictive model.