Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group

Joanna J Zhuang; Krina Zondervan; Fredrik Nyberg; Chris Harbron; Ansar Jawaid; Lon R Cardon; Bryan J Barratt; Andrew P Morris

doi:10.1002/gepi.20482

Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group

Genet Epidemiol. 2010 May;34(4):319-26. doi: 10.1002/gepi.20482.

Authors

Joanna J Zhuang¹, Krina Zondervan, Fredrik Nyberg, Chris Harbron, Ansar Jawaid, Lon R Cardon, Bryan J Barratt, Andrew P Morris

Affiliation

¹ Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK. [email protected]

Abstract

Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large-scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest-effect genes by making genome-wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as "genetically matched controls" for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false-positive error rate in the presence of population structure. As a remedy, we make use of genome-wide data and model selection techniques to identify "axes" of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Alleles
Computer Simulation
Data Interpretation, Statistical
False Positive Reactions
Gene Frequency
Genetic Variation
Genome-Wide Association Study*
Heterozygote
Humans
Models, Genetic
Models, Statistical
Odds Ratio
Reference Values
Research Design
Risk

Abstract

Publication types

MeSH terms

Grants and funding