On the use and misuse of scalar scores of confounders in design and analysis of observational studies

Stat Med. 2015 Aug 15;34(18):2618-35. doi: 10.1002/sim.6467. Epub 2015 Mar 17.

Abstract

We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset.

Keywords: balancing scores; confounder scores; matched case-control study; misspecified models; summary scores; treatment effect.

MeSH terms

  • Bias
  • Case-Control Studies*
  • Cohort Studies*
  • Computer Simulation
  • Confounding Factors, Epidemiologic*
  • Epidemiologic Methods*
  • Humans
  • Logistic Models
  • Observational Studies as Topic / methods*
  • Regression Analysis
  • Research Design