Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview

Ban Al-Sahab; Alan Leviton; Tobias Loddenkemper; Nigel Paneth; Bo Zhang

doi:10.1007/s41666-023-00153-2

Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview

J Healthc Inform Res. 2023 Nov 14;8(1):121-139. doi: 10.1007/s41666-023-00153-2. eCollection 2024 Mar.

Authors

Ban Al-Sahab¹, Alan Leviton^{2

3}, Tobias Loddenkemper^{2

3}, Nigel Paneth^{4

5}, Bo Zhang^{3

6

7}

Affiliations

¹ Department of Family Medicine, College of Human Medicine, Michigan State University, B100 Clinical Center, 788 Service Road, East Lansing, MI USA.
² Department of Neurology, Harvard Medical School, Boston, MA USA.
³ Department of Neurology, Boston Children's Hospital, Boston, MA USA.
⁴ Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI USA.
⁵ Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, East Lansing, MI USA.
⁶ Biostatistics and Research Design, Institutional Centers of Clinical and Translational Research, Boston Children's Hospital, Boston, MA USA.
⁷ Harvard Medical School, Boston, MA USA.

PMID: 38273982
PMCID: PMC10805748 (available on 2025-03-01)
DOI: 10.1007/s41666-023-00153-2

Abstract

Electronic Health Records (EHR) are increasingly being perceived as a unique source of data for clinical research as they provide unprecedentedly large volumes of real-time data from real-world settings. In this review of the secondary uses of EHR, we identify the anticipated breadth of opportunities, pointing out the data deficiencies and potential biases that are likely to limit the search for true causal relationships. This paper provides a comprehensive overview of the types of biases that arise along the pathways that generate real-world evidence and the sources of these biases. We distinguish between two levels in the production of EHR data where biases are likely to arise: (i) at the healthcare system level, where the principal source of bias resides in access to, and provision of, medical care, and in the acquisition and documentation of medical and administrative data; and (ii) at the research level, where biases arise from the processes of extracting, analyzing, and interpreting these data. Due to the plethora of biases, mainly in the form of selection and information bias, we conclude with advising extreme caution about making causal inferences based on secondary uses of EHRs.

Keywords: Bias; Electronic Health Records; Real World Data; Real World Evidence; Study Validity.

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Grants and funding

UH3 OD023285/OD/NIH HHS/United States