Bartender: a fast and accurate clustering algorithm to count barcode reads

Lu Zhao; Zhimin Liu; Sasha F Levy; Song Wu

doi:10.1093/bioinformatics/btx655

Bartender: a fast and accurate clustering algorithm to count barcode reads

Bioinformatics. 2018 Mar 1;34(5):739-747. doi: 10.1093/bioinformatics/btx655.

Authors

Lu Zhao¹, Zhimin Liu^{2

3}, Sasha F Levy^{2

3}, Song Wu¹

Affiliations

¹ Department of Applied Mathematics and Statistics.
² Laufer Center for Physical and Quantitative Biology.
³ Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY 11794, USA.

Abstract

Motivation: Barcode sequencing (bar-seq) is a high-throughput, and cost effective method to assay large numbers of cell lineages or genotypes in complex cell pools. Because of its advantages, applications for bar-seq are quickly growing-from using neutral random barcodes to study the evolution of microbes or cancer, to using pseudo-barcodes, such as shRNAs or sgRNAs to simultaneously screen large numbers of cell perturbations. However, the computational pipelines for bar-seq clustering are not well developed. Available methods often yield a high frequency of under-clustering artifacts that result in spurious barcodes, or over-clustering artifacts that group distinct barcodes together. Here, we developed Bartender, an accurate clustering algorithm to detect barcodes and their abundances from raw next-generation sequencing data.

Results: In contrast with existing methods that cluster based on sequence similarity alone, Bartender uses a modified two-sample proportion test that also considers cluster size. This modification results in higher accuracy and lower rates of under- and over-clustering artifacts. Additionally, Bartender includes unique molecular identifier handling and a 'multiple time point' mode that matches barcode clusters between different clustering runs for seamless handling of time course data. Bartender is a set of simple-to-use command line tools that can be performed on a laptop at comparable run times to existing methods.

Availability and implementation: Bartender is available at no charge for non-commercial use at https://github.com/LaoZZZZZ/bartender-1.1.

Contact: [email protected] or [email protected].

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Artifacts
Bacteria
Cluster Analysis*
Data Accuracy
High-Throughput Nucleotide Sequencing / methods*
Humans
Sequence Analysis, RNA
Software*

Abstract

Publication types

MeSH terms

Grants and funding