MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning

Methods. 2024 Nov 12:S1046-2023(24)00238-X. doi: 10.1016/j.ymeth.2024.11.001. Online ahead of print.

Abstract

Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integratingthegene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methodsin cell classification. In this work, we propose MVCLST, a multi-view comparative learningmethod to analyze spatial transcriptomicsdata for accurate cell type classification. MVCLSTconstructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to clusterthe learned featuresfor cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomicsdata analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. Italso outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulbdata.

Keywords: Cell type identification; Consensus clustering; Contrastive learning; Multi-view; Spatial transcriptome data clustering.