A machine learning approach to predicting risk of myelodysplastic syndrome

Ashwath Radhachandran; Anurag Garikipati; Zohora Iqbal; Anna Siefkas; Gina Barnes; Jana Hoffman; Qingqing Mao; Ritankar Das

doi:10.1016/j.leukres.2021.106639

A machine learning approach to predicting risk of myelodysplastic syndrome

Leuk Res. 2021 Oct:109:106639. doi: 10.1016/j.leukres.2021.106639. Epub 2021 Jun 8.

Authors

Ashwath Radhachandran¹, Anurag Garikipati¹, Zohora Iqbal², Anna Siefkas¹, Gina Barnes¹, Jana Hoffman¹, Qingqing Mao¹, Ritankar Das¹

Affiliations

¹ Dascena, Inc., Houston, TX, United States.
² Dascena, Inc., Houston, TX, United States. Electronic address: [email protected].

PMID: 34171604
DOI: 10.1016/j.leukres.2021.106639

Abstract

Background: Early myelodysplastic syndrome (MDS) diagnosis can allow physicians to provide early treatment, which may delay advancement of MDS and improve quality of life. However, MDS often goes unrecognized and is difficult to distinguish from other disorders. We developed a machine learning algorithm for the prediction of MDS one year prior to clinical diagnosis of the disease.

Methods: Retrospective analysis was performed on 790,470 patients over the age of 45 seen in the United States between 2007 and 2020. A gradient boosted decision tree model (XGB) was built to predict MDS diagnosis using vital signs, lab results, and demographics from the prior two years of patient data. The XGB model was compared to logistic regression (LR) and artificial neural network (ANN) models. The models did not use blast percentage and cytogenetics information as inputs. Predictions were made one year prior to MDS diagnosis as determined by International Classification of Diseases (ICD) codes, 9th and 10th revisions. Performance was assessed with regard to area under the receiver operating characteristic curve (AUROC).

Results: On a hold-out test set, the XGB model achieved an AUROC value of 0.87 for prediction of MDS one year prior to diagnosis, with a sensitivity of 0.79 and specificity of 0.80. The XGB model was compared against LR and ANN models, which achieved an AUROC of 0.838 and 0.832, respectively.

Conclusions: Machine learning may allow for early MDS diagnosis MDS and more appropriate treatment administration.

Keywords: Early prediction; Electronic health records (EHR); Machine learning; Myelodysplastic syndrome (MDS); Risk assessment.

MeSH terms

Adult
Aged
Aged, 80 and over
Algorithms*
Case-Control Studies
Female
Follow-Up Studies
Humans
Machine Learning*
Male
Middle Aged
Myelodysplastic Syndromes / diagnosis*
Myelodysplastic Syndromes / epidemiology
Neural Networks, Computer*
Prognosis
Quality of Life*
ROC Curve
Retrospective Studies
Risk Assessment / methods*
United States / epidemiology