Predicting Survival in Veterans with Follicular Lymphoma Using Structured Electronic Health Record Information and Machine Learning

Int J Environ Res Public Health. 2021 Mar 7;18(5):2679. doi: 10.3390/ijerph18052679.

Abstract

The most accurate prognostic approach for follicular lymphoma (FL), progression of disease at 24 months (POD24), requires two years' observation after initiating first-line therapy (L1) to predict outcomes. We applied machine learning to structured electronic health record (EHR) data to predict individual survival at L1 initiation. We grouped 523 observations and 1933 variables from a nationwide cohort of FL patients diagnosed 2006-2014 in the Veterans Health Administration into traditionally used prognostic variables ("curated"), commonly measured labs ("labs"), and International Classification of Diseases diagnostic codes ("ICD") sets. We compared performance of random survival forests (RSF) vs. traditional Cox model using four datasets: curated, curated + labs, curated + ICD, and curated + ICD + labs, also using Cox on curated + POD24. We evaluated variable importance and partial dependence plots with area under the receiver operating characteristic curve (AUC). RSF with curated + labs performed best, with mean AUC 0.73 (95% CI: 0.71-0.75). It approximated, but did not surpass, Cox with POD24 (mean AUC 0.74 [95% CI: 0.71-0.77]). RSF using EHR data achieved better performance than traditional prognostic variables, setting the foundation for the incorporation of our algorithm into the EHR. It also provides for possible future scenarios in which clinicians could be provided an EHR-based tool which approximates the predictive ability of the most accurate known indicator, using information available 24 months earlier.

Keywords: electronic health records; follicular lymphoma; healthcare; machine learning; medical and health data; predictive analytics; prognosis; random survival forest; survival analysis; veterans health administration.

MeSH terms

  • Electronic Health Records
  • Humans
  • International Classification of Diseases
  • Lymphoma, Follicular* / diagnosis
  • Machine Learning
  • Veterans*