Automatic variable selection for exposure-driven propensity score matching with unmeasured confounders

Biom J. 2020 May;62(3):868-884. doi: 10.1002/bimj.201800190. Epub 2020 Mar 23.

Abstract

Multivariable model building for propensity score modeling approaches is challenging. A common propensity score approach is exposure-driven propensity score matching, where the best model selection strategy is still unclear. In particular, the situation may require variable selection, while it is still unclear if variables included in the propensity score should be associated with the exposure and the outcome, with either the exposure or the outcome, with at least the exposure or with at least the outcome. Unmeasured confounders, complex correlation structures, and non-normal covariate distributions further complicate matters. We consider the performance of different modeling strategies in a simulation design with a complex but realistic structure and effects on a binary outcome. We compare the strategies in terms of bias and variance in estimated marginal exposure effects. Considering the bias in estimated marginal exposure effects, the most reliable results for estimating the propensity score are obtained by selecting variables related to the exposure. On average this results in the least bias and does not greatly increase variances. Although our results cannot be generalized, this provides a counterexample to existing recommendations in the literature based on simple simulation settings. This highlights that recommendations obtained in simple simulation settings cannot always be generalized to more complex, but realistic settings and that more complex simulation studies are needed.

Keywords: automated variable selection; exposure-driven matching; propensity score; unmeasured confounders.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Automation
  • Biometry / methods*
  • Models, Statistical
  • Multivariate Analysis
  • Propensity Score*