Multiple imputation analysis of case-cohort studies

Helena Marti; Michel Chavance

doi:10.1002/sim.4130

Multiple imputation analysis of case-cohort studies

Stat Med. 2011 Jun 15;30(13):1595-607. doi: 10.1002/sim.4130. Epub 2011 Feb 24.

Authors

Helena Marti¹, Michel Chavance

Affiliation

¹ Inserm, CESP Centre for Research in Epidemiology and Population Health, U1018, Biostatistics team, F-94807 Villejuif, France. [email protected]

PMID: 21351290
DOI: 10.1002/sim.4130

Abstract

The usual methods for analyzing case-cohort studies rely on sometimes not fully efficient weighted estimators. Multiple imputation might be a good alternative because it uses all the data available and approximates the maximum partial likelihood estimator. This method is based on the generation of several plausible complete data sets, taking into account uncertainty about missing values. When the imputation model is correctly defined, the multiple imputation estimator is asymptotically unbiased and its variance is correctly estimated. We show that a correct imputation model must be estimated from the fully observed data (cases and controls), using the case status among the explanatory variable. To validate the approach, we analyzed case-cohort studies first with completely simulated data and then with case-cohort data sampled from two real cohorts. The analyses of simulated data showed that, when the imputation model was correct, the multiple imputation estimator was unbiased and efficient. The observed gain in precision ranged from 8 to 37 per cent for phase-1 variables and from 5 to 19 per cent for the phase-2 variable. When the imputation model was misspecified, the multiple imputation estimator was still more efficient than the weighted estimators but it was also slightly biased. The analyses of case-cohort data sampled from complete cohorts showed that even when no strong predictor of the phase-2 variable was available, the multiple imputation was unbiased, as precised as the weighted estimator for the phase-2 variable and slightly more precise than the weighted estimators for the phase-1 variables. However, the multiple imputation estimator was found to be biased when, because of interaction terms, some coefficients of the imputation model had to be estimated from small samples. Multiple imputation is an efficient technique for analyzing case-cohort data. Practically, we suggest building the analysis model using only the case-cohort data and weighted estimators. Multiple imputation can eventually be used to reanalyze the data using the selected model in order to improve the precision of the results.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Case-Control Studies*
Cohort Studies*
Computer Simulation
Data Interpretation, Statistical*
Fibrinogen / analysis
Histocytochemistry
Humans
Male
Models, Statistical*
Myocardial Ischemia / epidemiology
Wilms Tumor / pathology

Substances

Fibrinogen