Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers

F William Townes; Rafael A Irizarry

doi:10.1186/s13059-020-02078-0

Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers

Genome Biol. 2020 Jul 3;21(1):160. doi: 10.1186/s13059-020-02078-0.

Authors

F William Townes¹, Rafael A Irizarry^{2

3}

Affiliations

¹ Department of Computer Science, Princeton University, Princeton, NJ, USA. [email protected].
² Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
³ Department of Biostatistics, Harvard University, Cambridge, MA, USA.

Abstract

Single-cell RNA-seq (scRNA-seq) profiles gene expression of individual cells. Unique molecular identifiers (UMIs) remove duplicates in read counts resulting from polymerase chain reaction, a major source of noise. For scRNA-seq data lacking UMIs, we propose quasi-UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. When applied to ground-truth datasets having both reads and UMIs, quasi-UMI normalization has higher accuracy than competing methods. Using quasi-UMIs enables methods designed specifically for UMI data to be applied to non-UMI scRNA-seq datasets.

Keywords: Gene expression; Normalization; Quasi-UMI; RNA-seq; Single cell.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Humans
Normal Distribution
Poisson Distribution
Sequence Analysis, RNA*
Single-Cell Analysis*

Abstract

Publication types

MeSH terms

Grants and funding