A permutation procedure to correct for confounders in case-control studies, including tests of rare variation

Am J Hum Genet. 2012 Aug 10;91(2):215-23. doi: 10.1016/j.ajhg.2012.06.004. Epub 2012 Jul 19.

Abstract

Many case-control tests of rare variation are implemented in statistical frameworks that make correction for confounders like population stratification difficult. Simple permutation of disease status is unacceptable for resolving this issue because the replicate data sets do not have the same confounding as the original data set. These limitations make it difficult to apply rare-variant tests to samples in which confounding most likely exists, e.g., samples collected from admixed populations. To enable the use of such rare-variant methods in structured samples, as well as to facilitate permutation tests for any situation in which case-control tests require adjustment for confounding covariates, we propose to establish the significance of a rare-variant test via a modified permutation procedure. Our procedure uses Fisher's noncentral hypergeometric distribution to generate permuted data sets with the same structure present in the actual data set such that inference is valid in the presence of confounding factors. We use simulated sequence data based on coalescent models to show that our permutation strategy corrects for confounding due to population stratification that, if ignored, would otherwise inflate the size of a rare-variant test. We further illustrate the approach by using sequence data from the Dallas Heart Study of energy metabolism traits. Researchers can implement our permutation approach by using the R package BiasedUrn.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Case-Control Studies*
  • Computer Simulation
  • Confounding Factors, Epidemiologic*
  • Data Interpretation, Statistical*
  • Genetic Variation*
  • Humans
  • Models, Genetic
  • Molecular Sequence Data
  • Rare Diseases / genetics*
  • Software*