First Use of Multiple Imputation with the National Tuberculosis Surveillance System

Epidemiol Res Int. 2012 Dec 18:2013:875234. doi: 10.1155/2013/875234.

Abstract

Aims: The purpose of this study was to compare methods for handling missing data in analysis of the National Tuberculosis Surveillance System of the Centers for Disease Control and Prevention. Because of the high rate of missing human immunodeficiency virus (HIV) infection status in this dataset, we used multiple imputation methods to minimize the bias that may result from less sophisticated methods.

Methods: We compared analysis based on multiple imputation methods with analysis based on deleting subjects with missing covariate data from regression analysis (case exclusion), and determined whether the use of increasing numbers of imputed datasets would lead to changes in the estimated association between isoniazid resistance and death.

Results: Following multiple imputation, the odds ratio for initial isoniazid resistance and death was 2.07 (95% CI 1.30, 3.29); with case exclusion, this odds ratio decreased to 1.53 (95% CI 0.83, 2.83). The use of more than 5 imputed datasets did not substantively change the results.

Conclusions: Our experience with the National Tuberculosis Surveillance System dataset supports the use of multiple imputation methods in epidemiologic analysis, but also demonstrates that close attention should be paid to the potential impact of missing covariates at each step of the analysis.