Penalized regression procedures for variable selection in the potential outcomes framework

Stat Med. 2015 May 10;34(10):1645-58. doi: 10.1002/sim.6433. Epub 2015 Jan 28.

Abstract

A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple 'impute, then select' class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data, and imputation are drawn. A difference least absolute shrinkage and selection operator algorithm is defined, along with its multiple imputation analogs. The procedures are illustrated using a well-known right-heart catheterization dataset.

Keywords: L1 penalty; average causal effect; counterfactual; imputed data; treatment heterogeneity.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cardiac Catheterization / adverse effects
  • Cardiac Catheterization / methods
  • Cardiac Catheterization / statistics & numerical data*
  • Causality*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Models, Statistical
  • Outcome Assessment, Health Care / methods
  • Outcome Assessment, Health Care / statistics & numerical data*
  • Randomized Controlled Trials as Topic / methods
  • Randomized Controlled Trials as Topic / statistics & numerical data*
  • Regression Analysis
  • Research Design / statistics & numerical data
  • Survival Analysis