Prediction of pulmonary embolism by an explainable machine learning approach in the real world

Sci Rep. 2025 Jan 4;15(1):835. doi: 10.1038/s41598-024-75435-9.

Abstract

In recent years, large amounts of researches showed that pulmonary embolism (PE) has become a common disease, and PE remains a clinical challenge because of its high mortality, high disability, high missed and high misdiagnosed rates. To address this, we employed an artificial intelligence-based machine learning algorithm (MLA) to construct a robust predictive model for PE. We retrospectively analyzed 1480 suspected PE patients hospitalized in West China Hospital of Sichuan University between May 2015 and April 2020. 126 features were screened and diverse MLAs were utilized to craft predictive models for PE. Area under the receiver operating characteristic curves (AUC) were used to evaluate their performance and SHapley Additive exPlanation (SHAP) values were utilized to elucidate the prediction model. Regarding the efficacy of the single model that most accurately predicted the outcome, RF demonstrated the highest efficacy in predicting outcomes, with an AUC of 0.776 (95% CI 0.774-0.778). The SHAP summary plot delineated the positive and negative effects of features attributed to the RF prediction model, including D-dimer, activated partial thromboplastin time (APTT), fibrin and fibrinogen degradation products (FFDP), platelet count, albumin, cholesterol, and sodium. Furthermore, the SHAP dependence plot illustrated the impact of individual features on the RF prediction model. Finally, the MLA based PE predicting model was designed as a web page that can be applied to the platform of clinical management. In this study, PE prediction model was successfully established and designed as a web page, facilitating the optimization of early diagnosis and timely treatment strategies to enhance PE patient outcomes.

Keywords: Machine learning algorithms; Prediction model; Pulmonary embolism; SHAP value.

MeSH terms

  • Adult
  • Aged
  • Algorithms
  • Area Under Curve
  • China / epidemiology
  • Female
  • Fibrin Fibrinogen Degradation Products / analysis
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Pulmonary Embolism* / diagnosis
  • ROC Curve
  • Retrospective Studies

Substances

  • Fibrin Fibrinogen Degradation Products
  • fibrin fragment D