Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

Avi Srivastava; Laraib Malik; Tom Smith; Ian Sudbery; Rob Patro

doi:10.1186/s13059-019-1670-y

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

Genome Biol. 2019 Mar 27;20(1):65. doi: 10.1186/s13059-019-1670-y.

Authors

Avi Srivastava¹, Laraib Malik¹, Tom Smith², Ian Sudbery³, Rob Patro⁴

Affiliations

¹ Department of Computer Science, Stony Brook University, Stony Brook, USA.
² Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
³ Sheffield Institute for Nucleic Acids, Department of Molecular Biology and Biotechnology, The University of Sheffield, Sheffield, S10 2TN, UK.
⁴ Department of Computer Science, Stony Brook University, Stony Brook, USA. [email protected].

Abstract

We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin's approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.

Keywords: Cellular barcode; Quantification; Single-cell RNA-seq; UMI deduplication.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
DNA Barcoding, Taxonomic
Humans
Mice
Sequence Analysis, RNA*
Single-Cell Analysis*
Software*

Abstract

Publication types

MeSH terms

Grants and funding