Background: Anticoagulation therapy is the mainstay of therapy for patients with venous thromboembolism (VTE). However, continuing or stopping anticoagulants after the first 3 to 6 months is a difficult decision that requires ascertainment of the risk of bleeding and recurrent VTE. Despite the development of several statistical models to predict bleeding, the benefit of machine learning (ML) models has not been investigated in depth.
Objectives: To assess the benefits of ML algorithms in bleeding risk evaluation in VTE patients and gain insight into their baseline information.
Methods: The baseline clinical, demographic, and genotype information was collected for 2542 patients with VTE who were on extended anticoagulation therapy. Six unsupervised dimensionality reduction and clustering ML algorithms were used to visualize and cluster the data for patients with major bleeding (118 patients) and nonbleeders. Eight supervised ML algorithms were trained and compared with the previously derived clinical models using a 5-fold nested cross-validation scheme.
Results: The baseline dataset for bleeders and nonbleeders showed a high degree of similarity. Two novel clusters were discovered within the dataset for bleeders based on the presence of isolated pulmonary embolism or isolated deep vein thrombosis, though the difference in bleeding risks was not statistically significant (P = .32). The supervised analysis showed that the ML and clinical models have similar discrimination (c-statistics, ∼62%) and calibration performance (Brier score, ∼0.045).
Conclusion: The clinical variables recorded at baseline are not distinctive enough to improve bleeding prediction beyond the performance of the existing models, and other strategies or data modalities should be considered.
Keywords: anticoagulants; calibration; hemorrhage; machine learning; venous thromboembolism.
© 2024 The Authors.