Development of a predictive model of venous thromboembolism recurrence in anticoagulated cancer patients using machine learning

Thromb Res. 2023 Aug:228:181-188. doi: 10.1016/j.thromres.2023.06.015. Epub 2023 Jun 16.

Abstract

Introduction: Patients with cancer and venous thromboembolism (VTE) show a high risk of VTE recurrence during anticoagulant treatment. This study aimed to develop a predictive model to assess the risk of VTE recurrence within 6 months at the moment of primary VTE diagnosis in these patients.

Materials and methods: Using the EHRead® technology, based on Natural Language Processing (NLP) and machine learning (ML), the unstructured data in electronic health records from 9 Spanish hospitals between 2014 and 2018 were extracted. Both clinically- and ML-driven feature selection were performed to identify predictors for VTE recurrence. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train different prediction models, which were subsequently validated in a hold-out data set.

Results: A total of 16,407 anticoagulated cancer patients with diagnosis of VTE were identified (54.4 % male and median age 70). Deep vein thrombosis, pulmonary embolism and metastases were observed in 67.2 %, 26.6 %, and 47.7 % of the patients, respectively. During the study follow-up, 11.4 % of the patients developed a recurrent VTE, being more frequent in patients with lung cancer. Feature selection and model training based on ML identified primary pulmonary embolism, deep vein thrombosis, metastasis, adenocarcinoma, hemoglobin and serum creatinine levels, platelet and leukocyte count, family history of VTE, and patients' age as predictors of VTE recurrence within 6 months of VTE diagnosis. The LR model had an AUC-ROC (95 % CI) of 0.66 (0.61, 0.70), the DT of 0.69 (0.65, 0.72) and the RF of 0.68 (0.63, 0.72).

Conclusions: This is the first ML-based predictive model designed to predict 6-months VTE recurrence in patients with cancer. These results hold great potential to assist clinicians to identify the high-risk patients and improve their clinical management.

Keywords: Anticoagulants; Cancer patients; Electronic health records; Machine learning; Natural language processing; Predictive model; Venous thromboembolism recurrence.

MeSH terms

  • Aged
  • Anticoagulants / therapeutic use
  • Humans
  • Infant
  • Machine Learning
  • Neoplasm Recurrence, Local / chemically induced
  • Neoplasm Recurrence, Local / drug therapy
  • Pulmonary Embolism* / drug therapy
  • Recurrence
  • Risk Factors
  • Venous Thromboembolism* / drug therapy
  • Venous Thromboembolism* / etiology
  • Venous Thrombosis* / drug therapy

Substances

  • Anticoagulants