A robust approach for electronic health record-based case-control studies with contaminated case pools

Biometrics. 2023 Sep;79(3):2023-2035. doi: 10.1111/biom.13721. Epub 2022 Jul 22.

Abstract

We consider analyses of case-control studies assembled from electronic health records (EHRs) where the pool of cases is contaminated by patients who are ineligible for the study. These ineligible patients, referred to as "false cases," should be excluded from the analyses if known. However, the true outcome status of a patient in the case pool is unknown except in a subset whose size may be arbitrarily small compared to the entire pool. To effectively remove the influence of the false cases on estimating odds ratio parameters defined by a working association model of the logistic form, we propose a general strategy to adaptively impute the unknown case status without requiring a correct phenotyping model to help discern the true and false case statuses. Our method estimates the target parameters as the solution to a set of unbiased estimating equations constructed using all available data. It outperforms existing methods by achieving robustness to mismodeling the relationship between the outcome status and covariates of interest, as well as improved estimation efficiency. We further show that our estimator is root-n-consistent and asymptotically normal. Through extensive simulation studies and analysis of real EHR data, we demonstrate that our method has desirable robustness to possible misspecification of both the association and phenotyping models, along with statistical efficiency superior to the competitors.

Keywords: case-control study; contaminated case pool; electronic health records; imputation; robustness to model misspecification.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Electronic Health Records*
  • Humans
  • Models, Statistical*