A joint analysis of single cell transcriptomics and proteomics using transformer

Yuanyuan Chen; Xiaodan Fan; Chaowen Shi; Zhiyan Shi; Chaojie Wang

doi:10.1038/s41540-024-00484-9

A joint analysis of single cell transcriptomics and proteomics using transformer

NPJ Syst Biol Appl. 2025 Jan 2;11(1):1. doi: 10.1038/s41540-024-00484-9.

Authors

Yuanyuan Chen¹, Xiaodan Fan², Chaowen Shi³, Zhiyan Shi¹, Chaojie Wang^{4

5}

Affiliations

¹ School of Mathematical Science, Jiangsu University, Zhenjiang, 212013, Jiangsu, China.
² Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, SAR, China.
³ School of Life Sciences, Jiangsu University, Zhenjiang, 212013, Jiangsu, China.
⁴ School of Mathematical Science, Jiangsu University, Zhenjiang, 212013, Jiangsu, China. [email protected].
⁵ The Fourth Affiliated Hospital of Jiangsu University, Jiangsu University, Zhenjiang, 212013, Jiangsu, China. [email protected].

Abstract

CITE-seq provides a powerful method for simultaneously measuring RNA and protein expression at the single-cell level. The integrated analysis of RNA and protein expression in identical cells is crucial for revealing cellular heterogeneity. However, the high experimental costs associated with CITE-seq limit its widespread application. In this paper, we propose scTEL, a deep learning framework based on Transformer encoder layers, to establish a mapping from sequenced RNA expression to unobserved protein expression in the same cells. This computation-based approach significantly reduces the experimental costs of protein expression sequencing. We are now able to predict protein expression using single-cell RNA sequencing (scRNA-seq) data, which is well-established and available at a lower cost. Moreover, our scTEL model offers a unified framework for integrating multiple CITE-seq datasets, addressing the challenge posed by the partial overlap of protein panels across different datasets. Empirical validation on public CITE-seq datasets demonstrates scTEL significantly outperforms existing methods.

MeSH terms

Algorithms
Computational Biology* / methods
Deep Learning
Gene Expression Profiling / methods
Humans
Proteomics* / methods
Sequence Analysis, RNA* / methods
Single-Cell Analysis* / methods
Transcriptome* / genetics