Performance of common analysis methods for detecting low-frequency single nucleotide variants in targeted next-generation sequence data

J Mol Diagn. 2014 Jan;16(1):75-88. doi: 10.1016/j.jmoldx.2013.09.003. Epub 2013 Nov 5.

Abstract

Next-generation sequencing (NGS) is becoming a common approach for clinical testing of oncology specimens for mutations in cancer genes. Unlike inherited variants, cancer mutations may occur at low frequencies because of contamination from normal cells or tumor heterogeneity and can therefore be challenging to detect using common NGS analysis tools, which are often designed for constitutional genomic studies. We generated high-coverage (>1000×) NGS data from synthetic DNA mixtures with variant allele fractions (VAFs) of 25% to 2.5% to assess the performance of four variant callers, SAMtools, Genome Analysis Toolkit, VarScan2, and SPLINTER, in detecting low-frequency variants. SAMtools had the lowest sensitivity and detected only 49% of variants with VAFs of approximately 25%; whereas the Genome Analysis Toolkit, VarScan2, and SPLINTER detected at least 94% of variants with VAFs of approximately 10%. VarScan2 and SPLINTER achieved sensitivities of 97% and 89%, respectively, for variants with observed VAFs of 1% to 8%, with >98% sensitivity and >99% positive predictive value in coding regions. Coverage analysis demonstrated that >500× coverage was required for optimal performance. The specificity of SPLINTER improved with higher coverage, whereas VarScan2 yielded more false positive results at high coverage levels, although this effect was abrogated by removing low-quality reads before variant identification. Finally, we demonstrate the utility of high-sensitivity variant callers with data from 15 clinical lung cancers.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma / diagnosis*
  • Adenocarcinoma / genetics*
  • Adenocarcinoma of Lung
  • Alleles
  • DNA / analysis
  • Databases, Genetic
  • ErbB Receptors / genetics
  • GTP Phosphohydrolases / genetics
  • Gene Frequency
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Lung Neoplasms / diagnosis*
  • Lung Neoplasms / genetics*
  • Membrane Proteins / genetics
  • Molecular Diagnostic Techniques / methods
  • Polymorphism, Single Nucleotide / genetics
  • Proto-Oncogene Proteins / genetics
  • Proto-Oncogene Proteins B-raf / genetics
  • Proto-Oncogene Proteins p21(ras)
  • Sequence Analysis, DNA / methods*
  • Tumor Suppressor Protein p53 / genetics
  • ras Proteins / genetics

Substances

  • KRAS protein, human
  • Membrane Proteins
  • Proto-Oncogene Proteins
  • TP53 protein, human
  • Tumor Suppressor Protein p53
  • DNA
  • EGFR protein, human
  • ErbB Receptors
  • BRAF protein, human
  • Proto-Oncogene Proteins B-raf
  • GTP Phosphohydrolases
  • NRAS protein, human
  • Proto-Oncogene Proteins p21(ras)
  • ras Proteins