Evaluating the coverage and potential of imputing the exome microarray with next-generation imputation using the 1000 Genomes Project

PLoS One. 2014 Sep 9;9(9):e106681. doi: 10.1371/journal.pone.0106681. eCollection 2014.

Abstract

Next-generation genotyping microarrays have been designed with insights from large-scale sequencing of exomes and whole genomes. The exome genotyping arrays promise to query the functional regions of the human genome at a fraction of the sequencing cost, thus allowing large number of samples to be genotyped. However, two pertinent questions exist: firstly, how representative is the content of the exome chip for populations not involved in the design of the chip; secondly, can the content of the exome chip be imputed with the reference data from the 1000 Genomes Project (1KGP). By deep whole-genome sequencing two Asian populations that are not part of the 1KGP, comprising 96 Southeast Asian Malays and 36 South Asian Indians for which the same samples have also been genotyped on both the Illumina 2.5 M and exome microarrays, we discovered the exome chip is a poor representation of exonic content in our two populations. However, up to 94.1% of the variants on the exome chip that are polymorphic in our populations can be confidently imputed with existing non-exome-centric microarrays using the 1KGP panel. The coverage further increases if there exists population-specific reference data from whole-genome sequencing. There is thus limited gain in using the exome chip for populations not involved in the microarray design. Instead, for the same cost of genotyping 2,000 samples on the exome chip, performing whole-genome sequencing of at least 35 samples in that population to complement the 1KGP may yield a higher coverage of the exonic content from imputation instead.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Asian People / genetics
  • Exome / genetics*
  • Exons / genetics
  • Genome, Human / genetics
  • Genome-Wide Association Study
  • Genomics / methods*
  • Genotyping Techniques / methods*
  • Haplotypes / genetics
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide / genetics

Grants and funding

This project acknowledges the support of the Saw Swee Hock School of Public Health, the Yong Loo Lin School of Medicine, the National University Health System, the Life Science Institute and the Office of Deputy President (Research and Technology) from the National University of Singapore. LPW, BL, RTHO and YYT additionally acknowledge support by the National Research Foundation, Prime Minister's Office, Singapore under its Research Fellowship (NRF-RF-2010-05) and administered by the National University of Singapore. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.