Sampling GWAS subjects from risk populations

Genet Epidemiol. 2011 Apr;35(3):148-53. doi: 10.1002/gepi.20562. Epub 2011 Feb 16.

Abstract

Power, i.e. sample size, is a crucial issue in genome-wide association studies (GWAS) on disorders generated by a multitude of weak genetic effects. Here, we examine the influence of sampling cases and/or controls from populations that are subjected to an external risk factor (such as smoking or nutritional factors). We use an additive threshold model and derive the necessary sample size as function of the external risk factor's strength and of the sampling scheme. If both cases and controls are sampled from the risk population, a loss of power must be expected. The loss of power (i.e. the increase of the necessary sample size) is even larger if only the cases are sampled from the risk population, whereas the inverse scheme (nonrisk cases and risk controls) provides a gain of power since nonrisk cases are enriched for disease-favouring alleles while risk controls are enriched for protective alleles. For small effect sizes, we derive simple approximations in analytically closed form. A strategy of GWAS sample collection from risk populations minimizing the necessary sample sizes may thus be deduced that generally applies as long as strong gene-environment interactions can be excluded.

MeSH terms

  • Alleles
  • Case-Control Studies
  • Disease / genetics
  • Environment
  • Gene Frequency
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Molecular Epidemiology / statistics & numerical data
  • Odds Ratio
  • Risk Factors
  • Sample Size