Population-specific genetic variation in large sequencing data sets: why more data is still better

Eur J Hum Genet. 2017 Oct;25(10):1173-1175. doi: 10.1038/ejhg.2017.110. Epub 2017 Jul 19.

Abstract

We have generated a next-generation whole-exome sequencing data set of 2628 participants of the population-based Rotterdam Study cohort, comprising 669 737 single-nucleotide variants and 24 019 short insertions and deletions. Because of broad and deep longitudinal phenotyping of the Rotterdam Study, this data set permits extensive interpretation of genetic variants on a range of clinically relevant outcomes, and is accessible as a control data set. We show that next-generation sequencing data sets yield a large degree of population-specific variants, which are not captured by other available large sequencing efforts, being ExAC, ESP, 1000G, UK10K, GoNL and DECODE.

MeSH terms

  • Datasets as Topic / standards*
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods
  • Genome-Wide Association Study / standards*
  • High-Throughput Nucleotide Sequencing / standards
  • Humans
  • Polymorphism, Genetic*
  • Sequence Analysis, DNA / standards