Single-cell RNA-seq (scRNA-seq) profiles gene expression of individual cells. Unique molecular identifiers (UMIs) remove duplicates in read counts resulting from polymerase chain reaction, a major source of noise. For scRNA-seq data lacking UMIs, we propose quasi-UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. When applied to ground-truth datasets having both reads and UMIs, quasi-UMI normalization has higher accuracy than competing methods. Using quasi-UMIs enables methods designed specifically for UMI data to be applied to non-UMI scRNA-seq datasets.
Keywords: Gene expression; Normalization; Quasi-UMI; RNA-seq; Single cell.