Identifying and mitigating biases in EHR laboratory tests

J Biomed Inform. 2014 Oct:51:24-34. doi: 10.1016/j.jbi.2014.03.016. Epub 2014 Apr 13.

Abstract

Electronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are subject to biased results and interpretations if laboratory measurements taken in different contexts are not explicitly separated. We show that the context of a laboratory test measurement can often be captured by the way the test is measured through time. We perform three tasks to study the properties of these temporal measurement patterns. In the first task, we confirm that laboratory test measurement patterns provide additional information to the stand-alone numerical value. The second task identifies three measurement pattern motifs across a set of 70 laboratory tests performed for over 14,000 patients. Of these, one motif exhibits properties that can lead to biased research results. In the third task, we demonstrate the potential for biased results on a specific example. We conduct an association study of lipase test values to acute pancreatitis. We observe a diluted signal when using only a lipase value threshold, whereas the full association is recovered when properly accounting for lipase measurements in different contexts (leveraging the lipase measurement patterns to separate the contexts). Aggregating EHR data without separating distinct laboratory test measurement patterns can intermix patients with different diseases, leading to the confounding of signals in large-scale EHR analyses. This paper presents a methodology for leveraging measurement frequency to identify and reduce laboratory test biases.

Keywords: Bias; Confounding; Electronic health record; Information theory; Laboratory testing; Missing data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Artifacts*
  • Clinical Laboratory Information Systems / classification
  • Clinical Laboratory Information Systems / statistics & numerical data*
  • Confounding Factors, Epidemiologic
  • Data Interpretation, Statistical*
  • Data Mining / methods*
  • Electronic Health Records / classification*
  • Electronic Health Records / statistics & numerical data*
  • New York
  • Pattern Recognition, Automated / methods*