A flexible genome-wide bootstrap method that accounts for ranking and threshold-selection bias in GWAS interpretation and replication study design

Stat Med. 2011 Jul 10;30(15):1898-912. doi: 10.1002/sim.4228. Epub 2011 May 3.

Abstract

The phenomenon known as the winner's curse is a form of selection bias that affects estimates of genetic association. In genome-wide association studies (GWAS) the bias is exacerbated by the use of stringent selection thresholds and ranking over hundreds of thousands of single nucleotide polymorphisms (SNPs). We develop an improved multi-locus bootstrap point estimate and confidence interval, which accounts for both ranking- and threshold-selection bias in the presence of genome-wide SNP linkage disequilibrium structure. The bootstrap method easily adapts to various study designs and alternative test statistics as well as complex SNP selection criteria. The latter is demonstrated by our application to the Wellcome Trust Case Control Consortium findings, in which the selection criterion was the minimum of the p-values for the additive and genotypic genetic effect models. In contrast, existing likelihood-based bias-reduced estimators account for the selection criterion applied to an SNP as if it were the only one tested, and so are more simple computationally, but do not address ranking across SNPs. Our simulation studies show that the bootstrap bias-reduced estimates are usually closer to the true genetic effect than the likelihood estimates and are less variable with a narrower confidence interval. Replication study sample size requirements computed from the bootstrap bias-reduced estimates are adequate 75-90 per cent of the time compared to 53-60 per cent of the time for the likelihood method. The bootstrap methods are implemented in a user-friendly package able to provide point and interval estimation for both binary and quantitative phenotypes in large-scale GWAS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Confidence Intervals
  • Genome-Wide Association Study / methods
  • Genome-Wide Association Study / standards*
  • Humans
  • Likelihood Functions
  • Linkage Disequilibrium / genetics*
  • Models, Genetic
  • Polymorphism, Single Nucleotide / genetics*
  • Selection Bias