A Gene Rank Based Approach for Single Cell Similarity Assessment and Clustering

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):431-442. doi: 10.1109/TCBB.2019.2931582. Epub 2021 Apr 6.

Abstract

Single-cell RNA sequencing (scRNA-seq) technology provides quantitative gene expression profiles at single-cell resolution. As a result, researchers have established new ways to explore cell population heterogeneity and genetic variability of cells. One of the current research directions for scRNA-seq data is to identify different cell types accurately through unsupervised clustering methods. However, scRNA-seq data analysis is challenging because of their high noise level, high dimensionality and sparsity. Moreover, the impact of multiple latent factors on gene expression heterogeneity and on the ability to accurately identify cell types remains unclear. How to overcome these challenges to reveal the biological difference between cell types has become the key to analyze scRNA-seq data. For these reasons, the unsupervised learning for cell population discovery based on scRNA-seq data analysis has become an important research area. A cell similarity assessment method plays a significant role in cell clustering. Here, we present BioRank, a new cell similarity assessment method based on annotated gene sets and gene ranks. To evaluate the performances, we cluster cells by two classical clustering algorithms based on the similarity between cells obtained by BioRank. In addition, BioRank can be used by any clustering algorithm that requires a similarity matrix. Applying BioRank to 12 public scRNA-seq datasets, we show that it is better than or at least as well as several popular similarity assessment methods for single cell clustering.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Genetic
  • Gene Ontology
  • Humans
  • Mice
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*
  • Transcriptome / genetics