Impeller: a path-based heterogeneous graph learning method for spatial transcriptomic data imputation

Ziheng Duan; Dylan Riffle; Ren Li; Junhao Liu; Martin Renqiang Min; Jing Zhang

doi:10.1093/bioinformatics/btae339

Impeller: a path-based heterogeneous graph learning method for spatial transcriptomic data imputation

Bioinformatics. 2024 Jun 3;40(6):btae339. doi: 10.1093/bioinformatics/btae339.

Authors

Ziheng Duan¹, Dylan Riffle¹, Ren Li², Junhao Liu¹, Martin Renqiang Min³, Jing Zhang¹

Affiliations

¹ Department of Computer Science, University of California, Irvine, Irvine, CA 92697, United States.
² Mathematical, Computational, and Systems Biology, University of California, Irvine, Irvine, CA 92697, United States.
³ Department of Machine Learning, NEC Labs America, Princeton, NJ 08540, United States.

Abstract

Motivation: Recent advances in spatial transcriptomics allow spatially resolved gene expression measurements with cellular or even sub-cellular resolution, directly characterizing the complex spatiotemporal gene expression landscape and cell-to-cell interactions in their native microenvironments. Due to technology limitations, most spatial transcriptomic technologies still yield incomplete expression measurements with excessive missing values. Therefore, gene imputation is critical to filling in missing data, enhancing resolution, and improving overall interpretability. However, existing methods either require additional matched single-cell RNA-seq data, which is rarely available, or ignore spatial proximity or expression similarity information.

Results: To address these issues, we introduce Impeller, a path-based heterogeneous graph learning method for spatial transcriptomic data imputation. Impeller has two unique characteristics distinct from existing approaches. First, it builds a heterogeneous graph with two types of edges representing spatial proximity and expression similarity. Therefore, Impeller can simultaneously model smooth gene expression changes across spatial dimensions and capture similar gene expression signatures of faraway cells from the same type. Moreover, Impeller incorporates both short- and long-range cell-to-cell interactions (e.g. via paracrine and endocrine) by stacking multiple GNN layers. We use a learnable path operator in Impeller to avoid the over-smoothing issue of the traditional Laplacian matrices. Extensive experiments on diverse datasets from three popular platforms and two species demonstrate the superiority of Impeller over various state-of-the-art imputation methods.

Availability and implementation: The code and preprocessed data used in this study are available at https://github.com/aicb-ZhangLabs/Impeller and https://zenodo.org/records/11212604.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Computational Biology / methods
Gene Expression Profiling / methods
Humans
Machine Learning
Single-Cell Analysis / methods
Software
Transcriptome* / genetics

Abstract

Publication types

MeSH terms

Grants and funding