A critical evaluation of genomic control methods for genetic association studies

Genet Epidemiol. 2009 May;33(4):290-8. doi: 10.1002/gepi.20379.

Abstract

Population stratification is an important potential confounder of genetic case-control association studies. For replication studies, limited availability of samples may lead to imbalanced sampling from heterogeneous populations. Genomic control (GC) can be used to correct chi(2) test statistics which are presumed to be inflated by a factor lambda; this may be estimated by a summary chi(2) value (lambda(median) or lambda(mean)) from a set of unlinked markers. Many studies applying GC methods have used fewer than 50 unlinked markers and an important question is whether this can adequately correct for population stratification. We assess the behavior of GC methods in imbalanced case-control studies using simulation. SNPs are sampled from two subpopulations with intra-continental levels of FST (< or =0.005) and sampling schemata ranging from balanced to completely imbalanced between subpopulations. The sampling properties of lambda(median) and lambda(mean) are explored using 6-1,600 unlinked markers to estimate Type 1 error and power empirically. GC corrections based on the chi(2)-distribution (GC(median) or GC(mean)) can be anti-conservative even when more than 100 single nucleotide polymorphisms (SNPs) are genotyped and realistic levels of population stratification exist. The GCF procedure performs well over a wider range of conditions, only becoming anti-conservative at low levels of alpha and with fewer than 25 SNPs genotyped. A substantial loss of power can arise when population stratification is present, but this is largely independent of the number of SNPs used. A literature survey shows that most studies applying GC have used GC(median) or GC(mean), rather than GCF, which is the most appropriate GC correction method.

Publication types

  • Evaluation Study
  • Review

MeSH terms

  • Alleles
  • Epidemiologic Methods
  • Gene Frequency
  • Genetic Predisposition to Disease
  • Genetics, Population / statistics & numerical data
  • Genome-Wide Association Study / statistics & numerical data*
  • Genomics / statistics & numerical data*
  • Humans
  • Polymorphism, Single Nucleotide
  • Risk Factors