Bioinformatics Analysis of Whole Exome Sequencing Data

Methods Mol Biol. 2019:1881:277-318. doi: 10.1007/978-1-4939-8876-1_21.

Abstract

This chapter contains a step-by-step protocol for identifying somatic SNPs and small Indels from next-generation sequencing data of tumor samples and matching normal samples. The workflow presented here is largely based on the Broad Institute's "Best Practices" guidelines and makes use of their Genome Analysis Toolkit (GATK) platform. Variants are annotated with population allele frequencies and curated resources such as GnomAD and ClinVar and curated effect predictions from dbNSFP using VCFtools, SnpEff, and SnpSift.

Keywords: Cancer research; Clinical genomics; Exome sequencing; Genome sequencing; Next-generation sequencing; Somatic variant detection; Variant annotation.

MeSH terms

  • Computational Biology / methods
  • Exome Sequencing / methods*
  • Genetic Variation
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Molecular Sequence Annotation
  • Neoplasms / genetics
  • Sequence Analysis, DNA / methods*
  • Software