Motivation: Single-cell RNA sequencing (scRNA-seq) analysis reveals heterogeneity and dynamic cell transitions. However, conventional gene-based analyses require intensive manual curation to interpret biological implications of computational results. Hence, a theory for efficiently annotating individual cells remains warranted.
Results: We present ASURAT, a computational tool for simultaneously performing unsupervised clustering and functional annotation of disease, cell type, biological process and signaling pathway activity for single-cell transcriptomic data, using a correlation graph decomposition for genes in database-derived functional terms. We validated the usability and clustering performance of ASURAT using scRNA-seq datasets for human peripheral blood mononuclear cells, which required fewer manual curations than existing methods. Moreover, we applied ASURAT to scRNA-seq and spatial transcriptome datasets for human small cell lung cancer and pancreatic ductal adenocarcinoma, respectively, identifying previously overlooked subpopulations and differentially expressed genes. ASURAT is a powerful tool for dissecting cell subpopulations and improving biological interpretability of complex and noisy transcriptomic data.
Availability and implementation: ASURAT is published on Bioconductor (https://doi.org/10.18129/B9.bioc.ASURAT). The codes for analyzing data in this article are available at Github (https://github.com/keita-iida/ASURATBI) and figshare (https://doi.org/10.6084/m9.figshare.19200254.v4).
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.