Secondary analysis of case-control data

Stat Med. 2006 Apr 30;25(8):1323-39. doi: 10.1002/sim.2283.

Abstract

We extend the discussion of Lee et al. and others on methods for performing secondary analyses of case-control sampled data and carry out an extensive investigation of efficiency and robustness. We find that, with the exception of the 'analyse-the-controls-only' strategy for populations in which cases are rare, ad hoc methods in common usage often lead to extremely misleading conclusions and that it is not possible to tell in advance when this will happen. Weighted likelihood and semi-parametric maximum likelihood methods are justified theoretically. We find that semi-parametric maximum likelihood can be as much as twice as efficient as the weighted method, but is subject to bias in estimating parameters of interest when the nuisance models this method requires have been mis-specified. The weighted method needs no nuisance models and thus is robust in this regard, but we cannot tell when it is going to be very inefficient without sophisticated modelling as through the SPML method. Practitioners should routinely use both methods and will often have to weigh up the practical consequences of severe inefficiency and lack of robustness in the context of their enquiries.

MeSH terms

  • Bias
  • Biometry / methods
  • Birth Weight
  • Case-Control Studies*
  • Child Development
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Epidemiologic Methods
  • Humans
  • Infant, Newborn
  • Likelihood Functions*
  • Logistic Models
  • Regression Analysis*