Purpose: In distributed research network (DRN) settings, multiple imputation cannot be directly implemented because pooling individual-level data are often not feasible. The performance of multiple imputation in combination with meta-analysis is not well understood within DRNs.
Methods: To evaluate the performance of imputation for missing baseline covariate data in combination with meta-analysis for time-to-event analysis within DRNs, we compared two parametric algorithms including one approximated linear imputation model (Approx), and one nonlinear substantive model compatible imputation model (SMC), as well as two non-parametric machine learning algorithms including random forest (RF), and classification and regression trees (CART), through simulation studies motivated by a real-world data set.
Results: Under the setting with small effect sizes (i.e., log-Hazard ratios [logHR]) and homogeneous missingness mechanisms across sites, all imputation methods produced unbiased and more efficient estimates while the complete-case analysis could be biased and inefficient; and under heterogeneous missingness mechanisms, estimates with RF method could have higher efficiency. Estimates from the distributed imputation combined by meta-analysis were similar to those from the imputation using pooled data. When logHRs were large, the SMC imputation algorithm generally performed better than others.
Conclusions: These findings suggest the validity and feasibility of imputation within DRNs in the presence of missing covariate data in time-to-event analysis under various settings. The performance of the four imputation algorithms varies with the effect sizes and level of missingness.
Keywords: Cox model; distributed research networks; missing covariates; multiple imputation; simulation study.
© 2022 John Wiley & Sons Ltd.