Motivation: The increasing accessibility of large-scale protein sequences through advanced sequencing technologies has necessitated the development of efficient and accurate methods for predicting protein function. Computational prediction models have emerged as a promising solution to expedite the annotation process. However, despite making significant progress in protein research, graph neural networks face challenges in capturing long-range structural correlations and identifying critical residues in protein graphs. Furthermore, existing models have limitations in effectively predicting the function of newly sequenced proteins that are not included in protein interaction networks. This highlights the need for novel approaches integrating protein structure and sequence data.
Results: We introduce MEGA-GO, highlighting the capability of capturing diverse protein sequence length features from multiple scales. The unique graph adaptive neural network architecture of MEGA-GO enables a more nuanced extraction of graph structure features, effectively capturing intricate relationships within biological data. Experimental results demonstrate that MEGA-GO outperforms mainstream protein function prediction models in the accuracy of Gene Ontology (GO) term classification, yielding 33.4%, 68.9%, and 44.6% of Area Under the Precision-Recall Curve (AUPR) on Biological Process (BP), Molecular Function (MF), and Cellular Component (CC) domains respectively. The rest of the experimental results reveal that our model consistently surpasses the state-of-the-art methods.
Availability and implementation: The source code and implementation of MEGA-GO are available at https://github.com/Cheliosoops/MEGA-GO.
Supplementary file: The supplementary file can be found at Supplementary.
© The Author(s) 2025. Published by Oxford University Press.