Assessing the impact of population stratification on association studies of rare variation

Hum Hered. 2013;76(1):28-35. doi: 10.1159/000353270. Epub 2013 Jul 31.

Abstract

Aims: The study of rare variants, which can potentially explain a great proportion of heritability, has emerged as an important topic in human gene mapping of complex diseases. Although several statistical methods have been developed to increase the power to detect disease-related rare variants, none of these methods address an important issue that often arises in genetic studies: false positives due to population stratification. Using simulations, we investigated the impact of population stratification on false-positive rates of rare-variant association tests.

Methods: We simulated a series of case-control studies assuming various sample sizes and levels of population structure. Using such data, we examined the impact of population stratification on rare-variant collapsing and burden tests of rare variation. We further evaluated the ability of 2 existing methods (principal component analysis and genomic control) to correct for stratification in such rare-variant studies.

Results: We found that population stratification can have a significant influence on studies of rare variants especially when the sample size is large and the population is severely stratified. Our results showed that principal component analysis performed quite well in most situations, while genomic control often yielded conservative results.

Conclusions: Our results imply that researchers need to carefully match cases and controls on ancestry in order to avoid false positives caused by population structure in studies of rare variants, particularly if genome-wide data are not available.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Gene Frequency
  • Genetic Variation*
  • Genetics, Population* / methods
  • Genome-Wide Association Study* / methods
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Polymorphism, Single Nucleotide
  • Population Dynamics*
  • Population Groups / genetics
  • Principal Component Analysis