Protein subcellular localization (PSL) is very important in order to understand its functions, and its movement between subcellular niches within cells plays fundamental roles in biological process regulation. Mass spectrometry-based spatio-temporal proteomics technologies can help provide new insights of protein translocation, but bring the challenge in identifying reliable protein translocation events due to the noise interference and insufficient data mining. We propose a semi-supervised graph convolution network (GCN)-based framework termed TransGCN that infers protein translocation events from spatio-temporal proteomics. Based on expanded multiple distance features and joint graph representations of proteins, TransGCN utilizes the semi-supervised GCN to enable effective knowledge transfer from proteins with known PSLs for predicting protein localization and translocation. Our results demonstrate that TransGCN outperforms current state-of-the-art methods in identifying protein translocations, especially in coping with batch effects. It also exhibited excellent predictive accuracy in PSL prediction. TransGCN is freely available on GitHub at https://github.com/XuejiangGuo/TransGCN.
Keywords: graph convolution network; mass spectrometry; protein subcellular localization; protein translocation; semi-supervised learning; spatio-temporal proteomics.
© The Author(s) 2024. Published by Oxford University Press.