Using the landmark method for creating prediction models in large datasets derived from electronic health records

Health Care Manag Sci. 2015 Mar;18(1):86-92. doi: 10.1007/s10729-014-9281-3. Epub 2014 Apr 22.

Abstract

With the integration of electronic health records (EHRs), health data has become easily accessible and abounded. The EHR has the potential to provide important healthcare information to researchers by creating study cohorts. However, accessing this information comes with three major issues: 1) Predictor variables often change over time, 2) Patients have various lengths of follow up within the EHR, and 3) the size of the EHR data can be computationally challenging. Landmark analyses provide a perfect complement to EHR data and help to alleviate these three issues. We present two examples that utilize patient birthdays as landmark times for creating dynamic datasets for predicting clinical outcomes. The use of landmark times help to solve these three issues by incorporating information that changes over time, by creating unbiased reference points that are not related to a patient's exposure within the EHR, and reducing the size of a dataset compared to true time-varying analysis. These techniques are shown using two example cohort studies from the Cleveland Clinic that utilized 4.5 million and 17,787 unique patients, respectively.

MeSH terms

  • Adult
  • Age Factors
  • Aged
  • Aged, 80 and over
  • Blood Chemical Analysis
  • Cohort Studies
  • Comorbidity
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Forecasting / methods*
  • Hematologic Tests
  • Humans
  • Male
  • Middle Aged
  • Models, Statistical*
  • Racial Groups
  • Risk Assessment / methods*
  • Sex Factors
  • Time Factors
  • United States