Interpretable prediction of 30-day mortality in patients with acute pancreatitis based on machine learning and SHAP

BMC Med Inform Decis Mak. 2024 Nov 5;24(1):328. doi: 10.1186/s12911-024-02741-7.

Abstract

Background: Severe acute pancreatitis (SAP) can be fatal if left unrecognized and untreated. The purpose was to develop a machine learning (ML) model for predicting the 30-day all-cause mortality risk in SAP patients and to explain the most important predictors.

Methods: This research utilized six ML methods, including logistic regression (LR), k-nearest neighbors(KNN), support vector machines (SVM), naive Bayes (NB), random forests(RF), and extreme gradient boosting(XGBoost), to construct six predictive models for SAP. An extensive evaluation was conducted to determine the most effective model and then the Shapley Additive exPlanations (SHAP) method was applied to visualize key variables. Utilizing the optimized model, stratified predictions were made for patients with SAP. Further, the study employed multivariable Cox regression analysis and Kaplan-Meier survival curves, along with subgroup analysis, to explore the relationship between the machine learning-based score and 30-day mortality.

Results: Through LASSO regression and recursive feature elimination (RFE), 25 optimal feature variables are selected. The XGBoost model performed best, with an area under the curve (AUC) of 0.881, a sensitivity of 0.5714, a specificity of 0.9651 and an F1 score of 0.64. The first six most important feature variables were the use of vasopressor, high Charlson comorbidity index, low blood oxygen saturation, history of malignant tumor, hyperglycemia and high APSIII score. Based on the optimal threshold of 0.62, patients were divided into high and low-risk groups, and the 30-day survival rate in the high-risk group decreased significantly. COX regression analysis further confirmed the positive correlation between high-risk scores and 30-day mortality. In the subgroup analysis, the model showed good risk stratification ability in patients with different gender, renal replacement therapy and with or without a history of malignant tumor, but it was not effective in predicting peripheral vascular disease.

Conclusions: the XGBoost model effectively predicts the severity of SAP, serving as a valuable tool for clinicians to identify SAP early.

Keywords: Machine learning; Mortality prediction; SHAP; Severe acute pancreatitis; XGBoost.