spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics

Gigascience. 2024 Jan 2:13:giae042. doi: 10.1093/gigascience/giae042.

Abstract

Background: Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times.

Findings: We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space.

Conclusions: In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.

Keywords: batch effect; contrastive learning; data integration; domain adaptation; spatial transcriptomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Brain / metabolism
  • Cluster Analysis
  • Computational Biology / methods
  • Gene Expression Profiling* / methods
  • Humans
  • Transcriptome*
  • Unsupervised Machine Learning*