Harnessing EHR data for health research

Nat Med. 2024 Jul;30(7):1847-1855. doi: 10.1038/s41591-024-03074-8. Epub 2024 Jul 4.

Abstract

With the increasing availability of rich, longitudinal, real-world clinical data recorded in electronic health records (EHRs) for millions of patients, there is a growing interest in leveraging these records to improve the understanding of human health and disease and translate these insights into clinical applications. However, there is also a need to consider the limitations of these data due to various biases and to understand the impact of missing information. Recognizing and addressing these limitations can inform the design and interpretation of EHR-based informatics studies that avoid confusing or incorrect conclusions, particularly when applied to population or precision medicine. Here we discuss key considerations in the design, implementation and interpretation of EHR-based informatics studies, drawing from examples in the literature across hypothesis generation, hypothesis testing and machine learning applications. We outline the growing opportunities for EHR-based informatics studies, including association studies and predictive modeling, enabled by evolving AI capabilities-while addressing limitations and potential pitfalls to avoid.

Publication types

  • Review

MeSH terms

  • Biomedical Research
  • Electronic Health Records*
  • Humans
  • Machine Learning
  • Medical Informatics / methods
  • Precision Medicine / methods