Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data

Pac Symp Biocomput. 2015:467-78.

Abstract

In this paper, we present a novel feature allocation model to describe tumor heterogeneity (TH) using next-generation sequencing (NGS) data. Taking a Bayesian approach, we extend the Indian buffet process (IBP) to define a class of nonparametric models, the categorical IBP (cIBP). A cIBP takes categorical values to denote homozygous or heterozygous genotypes at each SNV. We define a subclone as a vector of these categorical values, each corresponding to an SNV. Instead of partitioning somatic mutations into non-overlapping clusters with similar cellular prevalences, we took a different approach using feature allocation. Importantly, we do not assume somatic mutations with similar cellular prevalence must be from the same subclone and allow overlapping mutations shared across subclones. We argue that this is closer to the underlying theory of phylogenetic clonal expansion, as somatic mutations occurred in parent subclones should be shared across the parent and child subclones. Bayesian inference yields posterior probabilities of the number, genotypes, and proportions of subclones in a tumor sample, thereby providing point estimates as well as variabilities of the estimates for each subclone. We report results on both simulated and real data. BayClone is available at http://health.bsd.uchicago.edu/yji/soft.html.

MeSH terms

  • Bayes Theorem
  • Computational Biology
  • Computer Simulation
  • High-Throughput Nucleotide Sequencing / statistics & numerical data*
  • Humans
  • Likelihood Functions
  • Lung Neoplasms / genetics
  • Markov Chains
  • Models, Statistical*
  • Monte Carlo Method
  • Mutation
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide
  • Software*
  • Statistics, Nonparametric