A Machine Learning Model to Successfully Predict Future Diagnosis of Chronic Myelogenous Leukemia With Retrospective Electronic Health Records Data

Am J Clin Pathol. 2021 Nov 8;156(6):1142-1148. doi: 10.1093/ajcp/aqab086.

Abstract

Background: Chronic myelogenous leukemia (CML) is a clonal stem cell disorder accounting for 15% of adult leukemias. We aimed to determine if machine learning models could predict CML using blood cell counts prior to diagnosis.

Methods: We identified patients with a diagnostic test for CML (BCR-ABL1) and at least 6 consecutive prior years of differential blood cell counts between 1999 and 2020 in the largest integrated health care system in the United States. Blood cell counts from different time periods prior to CML diagnostic testing were used to train, validate, and test machine learning models.

Results: The sample included 1,623 patients with BCR-ABL1 positivity rate 6.2%. The predictive ability of machine learning models improved when trained with blood cell counts closer to time of diagnosis: 2 to 5 years area under the curve (AUC), 0.59 to 0.67, 0.5 to 1 years AUC, 0.75 to 0.80, at diagnosis AUC, 0.87 to 0.92.

Conclusions: Blood cell counts collected up to 5 years prior to diagnostic workup of CML successfully predicted the BCR-ABL1 test result. These findings suggest a machine learning model trained with blood cell counts could lead to diagnosis of CML earlier in the disease course compared to usual medical care.

Keywords: Chronic myelogenous leukemia; Decision support techniques; Decision trees; Logistic regression; Machine learning; Prediction model studies; Predictions and projections; Statistical data analyses.

MeSH terms

  • Diagnostic Tests, Routine*
  • Electronic Health Records
  • Fusion Proteins, bcr-abl / genetics
  • Humans
  • Leukemia, Myelogenous, Chronic, BCR-ABL Positive* / diagnosis
  • Machine Learning
  • Retrospective Studies

Substances

  • Fusion Proteins, bcr-abl