Valid and efficient subgroup analyses using nested case-control data

Int J Epidemiol. 2018 Jun 1;47(3):841-849. doi: 10.1093/ije/dyx282.

Abstract

Background: It is not uncommon for investigators to conduct further analyses of subgroups, using data collected in a nested case-control design. Since the sampling of the participants is related to the outcome of interest, the data at hand are not a representative sample of the population, and subgroup analyses need to be carefully considered for their validity and interpretation.

Methods: We performed simulation studies, generating cohorts within the proportional hazards model framework and with covariate coefficients chosen to mimic realistic data and more extreme situations. From the cohorts we sampled nested case-control data and analysed the effect of a binary exposure on a time-to-event outcome in subgroups defined by a covariate (an independent risk factor, a confounder or an effect modifier) and compared the estimates with the corresponding subcohort estimates. Cohort analyses were performed with Cox regression, and nested case-control samples or restricted subsamples were analysed with both conditional logistic regression and weighted Cox regression.

Results: For all studied scenarios, the subgroup analyses provided unbiased estimates of the exposure coefficients, with conditional logistic regression being less efficient than the weighted Cox regression.

Conclusions: For the study of a subpopulation, analysis of the corresponding subgroup of individuals sampled in a nested case-control design provides an unbiased estimate of the effect of exposure, regardless of whether the variable used to define the subgroup is a confounder, effect modifier or independent risk factor. Weighted Cox regression provides more efficient estimates than conditional logistic regression.

Keywords: Conditional logistic regression; risk set sampling; weighted Cox regression; weighted likelihood.