Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

Genes (Basel). 2021 Feb 22;12(2):311. doi: 10.3390/genes12020311.

Abstract

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

Keywords: cell subpopulation; parameter-free clustering; regularized Gaussian graphical model; scRNA-seq; subspace learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Cluster Analysis
  • Exome Sequencing
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Normal Distribution
  • RNA-Seq / statistics & numerical data*
  • Single-Cell Analysis / statistics & numerical data*