Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

Genome Biol. 2019 Mar 27;20(1):65. doi: 10.1186/s13059-019-1670-y.

Abstract

We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin's approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.

Keywords: Cellular barcode; Quantification; Single-cell RNA-seq; UMI deduplication.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • DNA Barcoding, Taxonomic
  • Humans
  • Mice
  • Sequence Analysis, RNA*
  • Single-Cell Analysis*
  • Software*