Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.
Keywords: dephosphorylation site; machine learning; posttranslational modification; protein phosphatase; transfer learning.
© The Author(s) 2025. Published by Oxford University Press.