Comprehensive variation discovery in single human genomes

Nat Genet. 2014 Dec;46(12):1350-5. doi: 10.1038/ng.3121. Epub 2014 Oct 19.

Abstract

Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome; however, calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from the finished sequence of 103 randomly chosen fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity by several fold, with the greatest impact in challenging regions of the human genome.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Chromosome Mapping
  • Gene Frequency
  • Genetic Variation*
  • Genome
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis
  • Polymerase Chain Reaction
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software

Associated data

  • BioProject/PRJNA196715
  • SRA/NA12878
  • SRA/NA12891
  • SRA/NA12892