Handling missing data when estimating causal effects with targeted maximum likelihood estimation

S Ghazaleh Dashti; Katherine J Lee; Julie A Simpson; Ian R White; John B Carlin; Margarita Moreno-Betancur

doi:10.1093/aje/kwae012

Handling missing data when estimating causal effects with targeted maximum likelihood estimation

Am J Epidemiol. 2024 Jul 8;193(7):1019-1030. doi: 10.1093/aje/kwae012.

Authors

S Ghazaleh Dashti, Katherine J Lee, Julie A Simpson, Ian R White, John B Carlin, Margarita Moreno-Betancur

Abstract

Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.

Keywords: causal inference; missing data; multiple imputation; targeted maximum likelihood estimation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Bias
Causality*
Computer Simulation
Data Interpretation, Statistical
Humans
Likelihood Functions
Models, Statistical

Abstract

Publication types

MeSH terms

Grants and funding