scCancer2: data-driven in-depth annotations of the tumor microenvironment at single-level resolution

Bioinformatics. 2024 Feb 1;40(2):btae028. doi: 10.1093/bioinformatics/btae028.

Abstract

Summary: Single-cell RNA-seq (scRNA-seq) is a powerful technique for decoding the complex cellular compositions in the tumor microenvironment (TME). As previous studies have defined many meaningful cell subtypes in several tumor types, there is a great need to computationally transfer these labels to new datasets. Also, different studies used different approaches or criteria to define the cell subtypes for the same major cell lineages. The relationships between the cell subtypes defined in different studies should be carefully evaluated. In this updated package scCancer2, designed for integrative tumor scRNA-seq data analysis, we developed a supervised machine learning framework to annotate TME cells with annotated cell subtypes from 15 scRNA-seq datasets with 594 samples in total. Based on the trained classifiers, we quantitatively constructed the similarity maps between the cell subtypes defined in different references by testing on all the 15 datasets. Secondly, to improve the identification of malignant cells, we designed a classifier by integrating large-scale pan-cancer TCGA bulk gene expression datasets and scRNA-seq datasets (10 cancer types, 175 samples, 663 857 cells). This classifier shows robust performances when no internal confidential reference cells are available. Thirdly, scCancer2 integrated a module to process the spatial transcriptomic data and analyze the spatial features of TME.

Availability and implementation: The package and user documentation are available at http://lifeome.net/software/sccancer2/ and https://doi.org/10.5281/zenodo.10477296.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods
  • Humans
  • Neoplasms* / genetics
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods
  • Software*
  • Tumor Microenvironment