A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis

Commun Biol. 2020 Dec 8;3(1):744. doi: 10.1038/s42003-020-01460-9.

Abstract

Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample.

MeSH terms

  • Databases, Genetic
  • Gene Expression Regulation, Neoplastic / physiology*
  • Genetic Predisposition to Disease
  • Genome, Human*
  • Humans
  • Mutation
  • Neoplasm Proteins / genetics
  • Neoplasm Proteins / metabolism*
  • Reproducibility of Results
  • Software

Substances

  • Neoplasm Proteins