A Bayesian method to cluster single-cell RNA sequencing data using copy number alterations

Salvatore Milite; Riccardo Bergamin; Lucrezia Patruno; Nicola Calonaci; Giulio Caravagna

doi:10.1093/bioinformatics/btac143

A Bayesian method to cluster single-cell RNA sequencing data using copy number alterations

Bioinformatics. 2022 Apr 28;38(9):2512-2518. doi: 10.1093/bioinformatics/btac143.

Authors

Salvatore Milite¹, Riccardo Bergamin¹, Lucrezia Patruno², Nicola Calonaci¹, Giulio Caravagna¹

Affiliations

¹ Department of Mathematics and Geosciences, University of Trieste, Trieste 34127, Italy.
² Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20125, Italy.

PMID: 35298589
DOI: 10.1093/bioinformatics/btac143

Abstract

Motivation: Cancers are composed by several heterogeneous subpopulations, each one harbouring different genetic and epigenetic somatic alterations that contribute to disease onset and therapy response. In recent years, copy number alterations (CNAs) leading to tumour aneuploidy have been identified as potential key drivers of such populations, but the definition of the precise makeup of cancer subclones from sequencing assays remains challenging. In the end, little is known about the mapping between complex CNAs and their effect on cancer phenotypes.

Results: We introduce CONGAS, a Bayesian probabilistic method to phase bulk DNA and single-cell RNA measurements from independent assays. CONGAS jointly identifies clusters of single cells with subclonal CNAs, and differences in RNA expression. The model builds statistical priors leveraging bulk DNA sequencing data, does not require a normal reference and scales fast thanks to a GPU backend and variational inference. We test CONGAS on both simulated and real data, and find that it can determine the tumour subclonal composition at the single-cell level together with clone-specific RNA phenotypes in tumour data generated from both 10× and Smart-Seq assays.

Availability and implementation: CONGAS is available as 2 packages: CONGAS (https://github.com/caravagnalab/congas), which implements the model in Python, and RCONGAS (https://caravagnalab.github.io/rcongas/), which provides R functions to process inputs, outputs and run CONGAS fits. The analysis of real data and scripts to generate figures of this paper are available via RCONGAS; code associated to simulations is available at https://github.com/caravagnalab/rcongas_test.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
DNA Copy Number Variations*
Humans
Neoplasms* / genetics
RNA
Sequence Analysis, RNA
Single-Cell Analysis
Software

Substances

RNA