The emergence of single-cell RNA sequencing (RNA-seq) has radically changed the observation of cellular diversity. Although annotations of RNA-seq data require preserved properties among cells of an identity, annotations using conventional methods have not been able to capture universal characters of a cell type. Analysis of expression levels cannot be accurately annotated for cells because differences in transcription do not necessarily explain biological characteristics in terms of cellular functions and because the data themselves do not inform about the correct mapping between cell types and genes. Hence, in this study, we developed a new representation of cellular identities that can be compared over different datasets while preserving nontrivial biological semantics. To generalize the notion of cell types, we developed a new framework to manage cellular identities in terms of set theory. We provided further insights into cells by installing mathematical descriptions of cell biology. We also performed experiments that could correspond to practical applications in annotations of RNA-seq data.
Keywords: annotation; cell type; cellular state; mathematical model; scRNA-seq; set theory; transcriptome.
Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.