CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data

Peijie Lin; Michael Troup; Joshua W K Ho

doi:10.1186/s13059-017-1188-0

CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data

Genome Biol. 2017 Mar 28;18(1):59. doi: 10.1186/s13059-017-1188-0.

Authors

Peijie Lin^{1

2}, Michael Troup¹, Joshua W K Ho^{3

4}

Affiliations

¹ Victor Chang Cardiac Research Institute, Darlinghurst, 2010, NSW, Australia.
² St Vincent's Clinical School, University of New South Wales, Darlinghurst, 2010, NSW, Australia.
³ Victor Chang Cardiac Research Institute, Darlinghurst, 2010, NSW, Australia. [email protected].
⁴ St Vincent's Clinical School, University of New South Wales, Darlinghurst, 2010, NSW, Australia. [email protected].

Abstract

Most existing dimensionality reduction and clustering packages for single-cell RNA-seq (scRNA-seq) data deal with dropouts by heavy modeling and computational machinery. Here, we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner. Using a range of simulated and real data, we show that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA, and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds when processing a data set of hundreds of cells and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.com/VCCRI/CIDR .

Keywords: Cell type; Clustering; Dimensionality reduction; Dropout; Imputation; Single-cell; scRNA-seq.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Brain / metabolism
Cluster Analysis*
Computational Biology / methods*
Computer Simulation
Datasets as Topic
Gene Expression Profiling
Genomics / methods*
Humans
Organ Specificity / genetics
Sequence Analysis, RNA* / methods
Single-Cell Analysis* / methods
Software*