Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins

Methods Mol Biol. 2022:2499:177-186. doi: 10.1007/978-1-0716-2317-6_9.

Abstract

Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew's correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/ .

Keywords: Deep learning; Glycosylation; Glycosylation sites prediction; Machine learning; N- and O-linked glycosylation sites; Posttranslational modifications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods
  • Glycosylation
  • Humans
  • Mice
  • Protein Processing, Post-Translational
  • Proteins* / chemistry
  • Support Vector Machine*

Substances

  • Proteins