Nphos: Database and Predictor of Protein N-phosphorylation

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3):qzae032. doi: 10.1093/gpbjnl/qzae032.

Abstract

Protein N-phosphorylation is widely present in nature and participates in various biological processes. However, current knowledge on N-phosphorylation is extremely limited compared to that on O-phosphorylation. In this study, we collected 11,710 experimentally verified N-phosphosites of 7344 proteins from 39 species and subsequently constructed the database Nphos to share up-to-date information on protein N-phosphorylation. Upon these substantial data, we characterized the sequential and structural features of protein N-phosphorylation. Moreover, after comparing hundreds of learning models, we chose and optimized gradient boosting decision tree (GBDT) models to predict three types of human N-phosphorylation, achieving mean area under the receiver operating characteristic curve (AUC) values of 90.56%, 91.24%, and 92.01% for pHis, pLys, and pArg, respectively. Meanwhile, we discovered 488,825 distinct N-phosphosites in the human proteome. The models were also deployed in Nphos for interactive N-phosphosite prediction. In summary, this work provides new insights and points for both flexible and focused investigations of N-phosphorylation. It will also facilitate a deeper and more systematic understanding of protein N-phosphorylation modification by providing a data and technical foundation. Nphos is freely available at http://www.bio-add.org/Nphos/ and http://ppodd.org.cn/Nphos/.

Keywords: N-phosphorylation; Benchmark dataset; Database; Machine learning; Post-translational modification.

MeSH terms

  • Databases, Protein*
  • Humans
  • Phosphoproteins / chemistry
  • Phosphoproteins / genetics
  • Phosphoproteins / metabolism
  • Phosphorylation
  • Proteome / metabolism

Substances

  • Phosphoproteins
  • Proteome