Accurate Annotation for Differentiating and Imbalanced Cell Types in Single-Cell Chromatin Accessibility Data

IEEE/ACM Trans Comput Biol Bioinform. 2024 May-Jun;21(3):461-471. doi: 10.1109/TCBB.2024.3372970. Epub 2024 Jun 5.

Abstract

Rapid advances in single-cell chromatin accessibility sequencing (scCAS) technologies have enabled the characterization of epigenomic heterogeneity and increased the demand for automatic annotation of cell types. However, there are few computational methods tailored for cell type annotation in scCAS data and the existing methods perform poorly for differentiating and imbalanced cell types. Here, we propose CASCADE, a novel annotation method based on simulation- and denoising-based strategies. With comprehensive experiments on a number of scCAS datasets, we showed that CASCADE can effectively distinguish the patterns of different cell types and mitigate the effect of high noise levels, and thus achieve significantly better annotation performance for differentiating and imbalanced cell types. Besides, we performed model ablation experiments to show the contribution of modules in CASCADE and conducted extensive experiments to demonstrate the robustness of CASCADE to batch effect, imbalance degree, data sparsity, and number of cell types. Moreover, CASCADE significantly outperformed baseline methods for accurately annotating the cell types in newly sequenced data. We anticipate that CASCADE will greatly assist with characterizing cell heterogeneity in scCAS data analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromatin* / chemistry
  • Chromatin* / genetics
  • Chromatin* / metabolism
  • Computational Biology* / methods
  • Humans
  • Molecular Sequence Annotation / methods
  • Sequence Analysis, DNA / methods
  • Single-Cell Analysis* / methods

Substances

  • Chromatin