Effect of genome-wide genotyping and reference panels on rare variants imputation

Hou-Feng Zheng; Martin Ladouceur; Celia M T Greenwood; J Brent Richards

doi:10.1016/j.jgg.2012.07.002

Effect of genome-wide genotyping and reference panels on rare variants imputation

J Genet Genomics. 2012 Oct 20;39(10):545-50. doi: 10.1016/j.jgg.2012.07.002. Epub 2012 Jul 24.

Authors

Hou-Feng Zheng¹, Martin Ladouceur, Celia M T Greenwood, J Brent Richards

Affiliation

¹ Department of Medicine, Human Genetics, McGill University, Montreal, Quebec H3T 1E2, Canada. [email protected]

PMID: 23089364
DOI: 10.1016/j.jgg.2012.07.002

Abstract

Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power. To estimate the performance of imputation of rare variants, we imputed 153 individuals, each of whom was genotyped on 3 different genotype arrays including 317k, 610k and 1 million single nucleotide polymorphisms (SNPs), to two different reference panels: HapMap2 and 1000 Genomes pilot March 2010 release (1KGpilot) by using IMPUTE version 2. We found that more than 94% and 84% of all SNPs yield acceptable accuracy (info > 0.4) in HapMap2 and 1KGpilot-based imputation, respectively. For rare variants (minor allele frequency (MAF) ≤5%), the proportion of well-imputed SNPs increased as the MAF increased from 0.3% to 5% across all 3 genome-wide association study (GWAS) datasets. The proportion of well-imputed SNPs was 69%, 60% and 49% for SNPs with a MAF from 0.3% to 5% for 1M, 610k and 317k, respectively. None of the very rare variants (MAF ≤ 0.3%) were well imputed. We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small. Variants with lower MAF are more difficult to impute. These findings have important implications in the design and replication of large-scale sequencing studies.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Computational Biology / standards
Gene Frequency
Genome-Wide Association Study / methods*
Genotyping Techniques / methods*
HapMap Project
Humans
Oligonucleotide Array Sequence Analysis
Polymorphism, Single Nucleotide / genetics*
Reference Standards
Software*

Grants and funding

Canadian Institutes of Health Research/Canada