Handling missing data when estimating causal effects with targeted maximum likelihood estimation

Am J Epidemiol. 2024 Jul 8;193(7):1019-1030. doi: 10.1093/aje/kwae012.

Abstract

Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.

Keywords: causal inference; missing data; multiple imputation; targeted maximum likelihood estimation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Bias
  • Causality*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Likelihood Functions
  • Models, Statistical