Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data

J Natl Cancer Inst. 2019 Jun 1;111(6):568-574. doi: 10.1093/jnci/djy178.

Abstract

Background: Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer.

Methods: The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80%/20% split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients.

Results: The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95% confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95% CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated.

Conclusions: The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.

MeSH terms

  • Aged
  • Databases, Factual
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Models, Statistical*
  • Neoplasm Metastasis
  • Neoplasms / mortality*
  • Neoplasms / pathology*
  • Neoplasms / radiotherapy
  • Palliative Care / statistics & numerical data
  • Prognosis
  • Proportional Hazards Models
  • Radiotherapy / statistics & numerical data
  • Survival Analysis