Characterization of tumor heterogeneity by latent haplotypes: a sequential Monte Carlo approach

PeerJ. 2018 May 30:6:e4838. doi: 10.7717/peerj.4838. eCollection 2018.

Abstract

Tumor samples obtained from a single cancer patient spatially or temporally often consist of varying cell populations, each harboring distinct mutations that uniquely characterize its genome. Thus, in any given samples of a tumor having more than two haplotypes, defined as a scaffold of single nucleotide variants (SNVs) on the same homologous genome, is evidence of heterogeneity because humans are diploid and we would therefore only observe up to two haplotypes if all cells in a tumor sample were genetically homogeneous. We characterize tumor heterogeneity by latent haplotypes and present state-space formulation of the feature allocation model for estimating the haplotypes and their proportions in the tumor samples. We develop an efficient sequential Monte Carlo (SMC) algorithm that estimates the states and the parameters of our proposed state-space model, which are equivalently the haplotypes and their proportions in the tumor samples. The sequential algorithm produces more accurate estimates of the model parameters when compared with existing methods. Also, because our algorithm processes the variant allele frequency (VAF) of a locus as the observation at a single time-step, VAF from newly sequenced candidate SNVs from next-generation sequencing (NGS) can be analyzed to improve existing estimates without re-analyzing the previous datasets, a feature that existing solutions do not possess.

Keywords: Bayesian; Haplotype; Heterogeneity; Monte Carlo; Sequential Monte Carlo; Tumor.

Grants and funding

This work was supported by the Petroleum Technology Development Fund, Nigeria. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.