COMSE: analysis of single-cell RNA-seq data using community detection-based feature selection

BMC Biol. 2024 Aug 7;22(1):167. doi: 10.1186/s12915-024-01963-5.

Abstract

Background: Single-cell RNA sequencing enables studying cells individually, yet high gene dimensions and low cell numbers challenge analysis. And only a subset of the genes detected are involved in the biological processes underlying cell-type specific functions.

Result: In this study, we present COMSE, an unsupervised feature selection framework using community detection to capture informative genes from scRNA-seq data. COMSE identified homogenous cell substates with high resolution, as demonstrated by distinguishing different cell cycle stages. Evaluations based on real and simulated scRNA-seq datasets showed COMSE outperformed methods even with high dropout rates in cell clustering assignment. We also demonstrate that by identifying communities of genes associated with batch effects, COMSE parses signals reflecting biological difference from noise arising due to differences in sequencing protocols, thereby enabling integrated analysis of scRNA-seq datasets of different sources.

Conclusions: COMSE provides an efficient unsupervised framework that selects highly informative genes in scRNA-seq data improving cell sub-states identification and cell clustering. It identifies gene subsets that reveal biological and technical heterogeneity, supporting applications like batch effect correction and pathway analysis. It also provides robust results for bulk RNA-seq data analysis.

Keywords: Community detection; Feature selection; Single-cell RNA-sequencing.

MeSH terms

  • Animals
  • Humans
  • Mice
  • RNA-Seq* / methods
  • Single-Cell Gene Expression Analysis*