Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance

BMC Res Notes. 2014 Oct 22:7:747. doi: 10.1186/1756-0500-7-747.

Abstract

Background: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms.

Results: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype.

Conclusion: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Arabs / genetics*
  • Databases, Genetic
  • Diabetes Mellitus / ethnology
  • Diabetes Mellitus / genetics*
  • Genetic Predisposition to Disease
  • Genome, Human*
  • Genome-Wide Association Study / methods*
  • Heredity*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Models, Genetic
  • Obesity / ethnology
  • Obesity / genetics*
  • Pedigree
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Rare Diseases / ethnology
  • Rare Diseases / genetics*
  • Reproducibility of Results
  • Software