Inferring cell diversity in single cell data using consortium-scale epigenetic data as a biological anchor for cell identity

Nucleic Acids Res. 2023 Jun 23;51(11):e62. doi: 10.1093/nar/gkad307.

Abstract

Methods for cell clustering and gene expression from single-cell RNA sequencing (scRNA-seq) data are essential for biological interpretation of cell processes. Here, we present TRIAGE-Cluster which uses genome-wide epigenetic data from diverse bio-samples to identify genes demarcating cell diversity in scRNA-seq data. By integrating patterns of repressive chromatin deposited across diverse cell types with weighted density estimation, TRIAGE-Cluster determines cell type clusters in a 2D UMAP space. We then present TRIAGE-ParseR, a machine learning method which evaluates gene expression rank lists to define gene groups governing the identity and function of cell types. We demonstrate the utility of this two-step approach using atlases of in vivo and in vitro cell diversification and organogenesis. We also provide a web accessible dashboard for analysis and download of data and software. Collectively, genome-wide epigenetic repression provides a versatile strategy to define cell diversity and study gene regulation of scRNA-seq data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Epigenesis, Genetic
  • Gene Expression Profiling* / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Software