Background: The objective of this IRB-approved retrospective monocentric study was to identify risk factors for mortality after surgery for congenital heart defects (CHDs) in pediatric patients using machine learning (ML). CHD belongs to the most common congenital malformations, and remains the leading mortality cause from birth defects. Methods: The most recent available hospital encounter for each patient with an age <18 years hospitalized for CHD-related cardiac surgery between the years 2011 and 2020 was included in this study. The cohort consisted of 1302 eligible patients (mean age [SD]: 402.92 [±562.31] days), who were categorized into four disease groups. A random survival forest (RSF) and the 'eXtreme Gradient Boosting' algorithm (XGB) were applied to model mortality (incidence: 5.6% [n = 73 events]). All models were then applied to predict the outcome in an independent holdout test dataset (40% of the cohort). Results: RSF and XGB achieved average C-indices of 0.85 (±0.01) and 0.79 (±0.03), respectively. Feature importance was assessed with 'SHapley Additive exPlanations' (SHAP) and 'Time-dependent explanations of machine learning survival models' (SurvSHAP(t)), both of which revealed high importance of the maximum values of serum creatinine observed within 72 h post-surgery for both ML methods. Conclusions: ML methods, along with model explainability tools, can reveal interesting insights into mortality risk after surgery for CHD. The proposed analytical workflow can serve as a blueprint for translating the analysis into a federated setting that builds upon the infrastructure of the German Medical Informatics Initiative.
Keywords: congenital heart defects (CHDs); eXtreme Gradient Boosting (XGB); feature importance; machine learning (ML); mortality; random survival forest (RSF); risk factors.