Tumor phylogeny inference using tree-constrained importance sampling

Bioinformatics. 2017 Jul 15;33(14):i152-i160. doi: 10.1093/bioinformatics/btx270.

Abstract

Motivation: A tumor arises from an evolutionary process that can be modeled as a phylogenetic tree. However, reconstructing this tree is challenging as most cancer sequencing uses bulk tumor tissue containing heterogeneous mixtures of cells.

Results: We introduce P robabilistic A lgorithm for S omatic Tr ee I nference (PASTRI), a new algorithm for bulk-tumor sequencing data that clusters somatic mutations into clones and infers a phylogenetic tree that describes the evolutionary history of the tumor. PASTRI uses an importance sampling algorithm that combines a probabilistic model of DNA sequencing data with a enumeration algorithm based on the combinatorial constraints defined by the underlying phylogenetic tree. As a result, tree inference is fast, accurate and robust to noise. We demonstrate on simulated data that PASTRI outperforms other cancer phylogeny algorithms in terms of runtime and accuracy. On real data from a chronic lymphocytic leukemia (CLL) patient, we show that a simple linear phylogeny better explains the data the complex branching phylogeny that was previously reported. PASTRI provides a robust approach for phylogenetic tree inference from mixed samples.

Availability and implementation: Software is available at compbio.cs.brown.edu/software.

Contact: [email protected].

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Clonal Evolution*
  • Genomics / methods*
  • Humans
  • Leukemia, Lymphoid / genetics
  • Leukemia, Lymphoid / physiopathology
  • Models, Statistical
  • Neoplasms / genetics*
  • Neoplasms / physiopathology
  • Sequence Analysis, DNA / methods*
  • Software*