Background: Pulmonary embolism (PE) is a severe condition that causes significant mortality and morbidity. Due to its acute nature, scores have been developed to stratify patients at high risk of 30-day mortality. Here we develop a machine-learning based score to predict 30-day, 90-day, and 365-day mortality in PE patients.
Methods: The Birmingham and Black Country Venous Thromboembolism registry (BBC-VTE) of 2183 venous thromboembolism patients is used. Random forests were trained on a 70% training cohort and tested against 30% held-out set. The outcomes of interest were 30-day, 90-day, and 365-day mortality. These were compared to the pulmonary embolism severity index (PESI) and simplified pulmonary embolism severity index (sPESI). Shapley values were used to determine important predictors. Oral anticoagulation at discharge was also investigated as a predictor of mortality.
Results: The machine learning risk score predicted 30-day mortality with AUC 0.71 [95% CI: 0.63 - 0.78] compared to the sPESI AUC of 0.65 [95% CI: 0.57 - 0.73] and PESI AUC of 0.64 [95% CI: 0.56 - 0.72]. 90-day mortality and 365-day mortality were predicted with an AUC of 0.74 and 0.73 respectively. High counts of neutrophils, white blood cell counts, and c-reactive protein and low counts of haemoglobin were important for 30-day mortality prediction but progressively lost importance with time. Older age was an important predictor of high risk throughout.
Conclusion: Machine learning algorithms have improved on standard clinical risk stratification for PE patients. External cohort validation is required before incorporation into clinical workflows.
Keywords: Deep vein thrombosis; Random forests; Risk stratification; Simplified pulmonary embolism severity index; Venous thromboembolism.
Copyright © 2023 The Author(s). Published by Elsevier B.V. All rights reserved.