Ensemble-Based Somatic Mutation Calling in Cancer Genomes

Methods Mol Biol. 2020:2120:37-46. doi: 10.1007/978-1-0716-0327-7_3.

Abstract

Identification of somatic mutations in tumor tissue is challenged by both technical artifacts, diverse somatic mutational processes, and genetic heterogeneity in the tumors. Indeed, recent independent benchmark studies have revealed low concordance between different somatic mutation callers. Here, we describe Somatic Mutation calling method using a Random Forest (SMuRF), a portable ensemble method that combines the predictions and auxiliary features from individual mutation callers using supervised machine learning. SMuRF has improved prediction accuracy for both somatic point mutations (single nucleotide variants; SNVs) and small insertions/deletions (indels) in cancer genomes and exomes. Here, we describe the method and provide a tutorial on the installation and application of SMuRF.

Keywords: Next-generation sequencing; Somatic mutation calling.

MeSH terms

  • Genome, Human
  • Genomics / methods*
  • Humans
  • INDEL Mutation
  • Mutation*
  • Neoplasms / genetics*
  • Point Mutation
  • Polymorphism, Single Nucleotide
  • Software*
  • Supervised Machine Learning*