Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models

Tyler Funnell; Allen W Zhang; Diljot Grewal; Steven McKinney; Ali Bashashati; Yi Kan Wang; Sohrab P Shah

doi:10.1371/journal.pcbi.1006799

Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models

PLoS Comput Biol. 2019 Feb 22;15(2):e1006799. doi: 10.1371/journal.pcbi.1006799. eCollection 2019 Feb.

Authors

Tyler Funnell¹, Allen W Zhang², Diljot Grewal¹, Steven McKinney², Ali Bashashati², Yi Kan Wang², Sohrab P Shah^{1

2

3}

Affiliations

¹ Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.
² Department of Molecular Oncology, BC Cancer Agency, Vancouver, British Columbia, Canada.
³ Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

Abstract

Mutation signatures in cancer genomes reflect endogenous and exogenous mutational processes, offering insights into tumour etiology, features for prognostic and biologic stratification and vulnerabilities to be exploited therapeutically. We present a novel machine learning formalism for improved signature inference, based on multi-modal correlated topic models (MMCTM) which can at once infer signatures from both single nucleotide and structural variation counts derived from cancer genome sequencing data. We exemplify the utility of our approach on two hormone driven, DNA repair deficient cancers: breast and ovary (n = 755 samples total). We show how introducing correlated structure both within and between modes of mutation can increase accuracy of signature discovery, particularly in the context of sparse data. Our study emphasizes the importance of integrating multiple mutation modes for signature discovery and patient stratification, and provides a statistical modeling framework to incorporate additional features of interest for future studies.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Genetic Variation / genetics
Genome
Humans
Machine Learning
Models, Statistical
Mutation
Neoplasms / genetics*
Point Mutation / genetics
Prognosis
Sequence Analysis, DNA / methods*
Transcriptome / genetics

Abstract

Publication types

MeSH terms

Grants and funding