Unsupervised cell functional annotation for single-cell RNA-seq

Genome Res. 2022 Sep 27;32(9):1765-1775. doi: 10.1101/gr.276609.122.

Abstract

One of the first steps in the analysis of single-cell RNA sequencing (scRNA-seq) data is the assignment of cell types. Although a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments, we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both low-dimensional representation for all genes and cell-specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-seq data sets from several different organs. We show, by using knowledge about gene sets, that UNIFAN greatly outperforms prior methods developed for clustering scRNA-seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster, making annotations easier.

MeSH terms

  • Animals
  • Cluster Analysis
  • Gene Expression Profiling / methods
  • Humans
  • Mice
  • Molecular Sequence Annotation
  • Neural Networks, Computer
  • RNA-Seq* / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Single-Cell Gene Expression Analysis