Using Machine Learning and Feature Importance to Identify Risk Factors for Mortality in Pediatric Heart Surgery

Lorenz A Kapsner; Manuel Feißt; Ariawan Purbojo; Hans-Ulrich Prokosch; Thomas Ganslandt; Sven Dittrich; Jonathan M Mang; Wolfgang Wällisch

doi:10.3390/diagnostics14222587

Using Machine Learning and Feature Importance to Identify Risk Factors for Mortality in Pediatric Heart Surgery

Diagnostics (Basel). 2024 Nov 18;14(22):2587. doi: 10.3390/diagnostics14222587.

Authors

Lorenz A Kapsner^{1

2}, Manuel Feißt³, Ariawan Purbojo⁴, Hans-Ulrich Prokosch¹, Thomas Ganslandt¹, Sven Dittrich⁵, Jonathan M Mang⁶, Wolfgang Wällisch⁵

Affiliations

¹ Medial Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91058 Erlangen, Germany.
² Institute of Radiology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany.
³ Institute of Medical Biometry, University of Heidelberg, 69117 Heidelberg, Germany.
⁴ Department of Paediatric Cardiac Surgery, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany.
⁵ Department of Pediatric Cardiology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany.
⁶ Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, 91054 Erlangen, Germany.

Abstract

Background: The objective of this IRB-approved retrospective monocentric study was to identify risk factors for mortality after surgery for congenital heart defects (CHDs) in pediatric patients using machine learning (ML). CHD belongs to the most common congenital malformations, and remains the leading mortality cause from birth defects. Methods: The most recent available hospital encounter for each patient with an age <18 years hospitalized for CHD-related cardiac surgery between the years 2011 and 2020 was included in this study. The cohort consisted of 1302 eligible patients (mean age [SD]: 402.92 [±562.31] days), who were categorized into four disease groups. A random survival forest (RSF) and the 'eXtreme Gradient Boosting' algorithm (XGB) were applied to model mortality (incidence: 5.6% [n = 73 events]). All models were then applied to predict the outcome in an independent holdout test dataset (40% of the cohort). Results: RSF and XGB achieved average C-indices of 0.85 (±0.01) and 0.79 (±0.03), respectively. Feature importance was assessed with 'SHapley Additive exPlanations' (SHAP) and 'Time-dependent explanations of machine learning survival models' (SurvSHAP(t)), both of which revealed high importance of the maximum values of serum creatinine observed within 72 h post-surgery for both ML methods. Conclusions: ML methods, along with model explainability tools, can reveal interesting insights into mortality risk after surgery for CHD. The proposed analytical workflow can serve as a blueprint for translating the analysis into a federated setting that builds upon the infrastructure of the German Medical Informatics Initiative.

Keywords: congenital heart defects (CHDs); eXtreme Gradient Boosting (XGB); feature importance; machine learning (ML); mortality; random survival forest (RSF); risk factors.

Abstract

Grants and funding