Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender

Stephen J Fleming; Mark D Chaffin; Alessandro Arduini; Amer-Denis Akkad; Eric Banks; John C Marioni; Anthony A Philippakis; Patrick T Ellinor; Mehrtash Babadi

doi:10.1038/s41592-023-01943-7

Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender

Nat Methods. 2023 Sep;20(9):1323-1335. doi: 10.1038/s41592-023-01943-7. Epub 2023 Aug 7.

Authors

Stephen J Fleming^{1

2}, Mark D Chaffin^{3

4}, Alessandro Arduini^{3

5}, Amer-Denis Akkad⁶, Eric Banks⁷, John C Marioni^{8

9}, Anthony A Philippakis⁷, Patrick T Ellinor^{3

4

10}, Mehrtash Babadi^{11

12}

Affiliations

¹ Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA. [email protected].
² Precision Cardiology Laboratory (PCL), Broad Institute of MIT and Harvard, Cambridge, MA, USA. [email protected].
³ Precision Cardiology Laboratory (PCL), Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁴ Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁵ Bayer US, LLC, Cambridge, MA, USA.
⁶ Precision Cardiology Laboratory (PCL), Bayer US, LLC, Cambridge, MA, USA.
⁷ Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁸ Wellcome Sanger Institute, Hinxton, Cambridge, UK.
⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
¹⁰ Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
¹¹ Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA. [email protected].
¹² Precision Cardiology Laboratory (PCL), Broad Institute of MIT and Harvard, Cambridge, MA, USA. [email protected].

PMID: 37550580
DOI: 10.1038/s41592-023-01943-7

Abstract

Droplet-based single-cell assays, including single-cell RNA sequencing (scRNA-seq), single-nucleus RNA sequencing (snRNA-seq) and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), generate considerable background noise counts, the hallmark of which is nonzero counts in cell-free droplets and off-target gene expression in unexpected cell types. Such systematic background noise can lead to batch effects and spurious differential gene expression results. Here we develop a deep generative model based on the phenomenology of noise generation in droplet-based assays. The proposed model accurately distinguishes cell-containing droplets from cell-free droplets, learns the background noise profile and provides noise-free quantification in an end-to-end fashion. We implement this approach in the scalable and robust open-source software package CellBender. Analysis of simulated data demonstrates that CellBender operates near the theoretically optimal denoising limit. Extensive evaluations using real datasets and experimental benchmarks highlight enhanced concordance between droplet-based single-cell data and established gene expression patterns, while the learned background noise profile provides evidence of degraded or uncaptured cell types.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Gene Expression Profiling / methods
RNA, Small Nuclear*
Sequence Analysis, RNA / methods
Single-Cell Analysis / methods
Software*

Substances

RNA, Small Nuclear