Causal inference practitioners are routinely presented with the challenge of model selection and, in particular, reducing the size of the covariate set with the goal of improving estimation efficiency. Collaborative targeted minimum loss-based estimation (CTMLE) is a general framework for constructing doubly robust semiparametric causal estimators that data-adaptively limit model complexity in the propensity score to optimize a preferred loss function. This stepwise complexity reduction is based on a loss function placed on a strategically updated model for the outcome variable through which the error is assessed using cross-validation. We demonstrate how the existing stepwise variable selection CTMLE can be generalized using regression shrinkage of the propensity score. We present 2 new algorithms that involve stepwise selection of the penalization parameter(s) in the regression shrinkage. Simulation studies demonstrate that, under a misspecified outcome model, mean squared error and bias can be reduced by a CTMLE procedure that separately penalizes individual covariates in the propensity score. We demonstrate these approaches in an example using electronic medical data with sparse indicator covariates to evaluate the relative safety of 2 similarly indicated asthma therapies for pregnant women with moderate asthma.
Keywords: TMLE; causal inference; model selection; regression penalization; variable selection.
Copyright © 2017 John Wiley & Sons, Ltd.