PredCSO: an ensemble method for the prediction of S-sulfenylation sites in proteins

Mol Omics. 2018 Aug 6;14(4):257-265. doi: 10.1039/c8mo00089a.

Abstract

Protein S-sulfenylation is a type of reversible post-translational modification (PTM) through which cysteine (CYS) thiols of proteins are reversibly oxidized to cysteine sulfenic acids (CSO). Recent studies have shown that this event plays an essential role in cell signaling, transcriptional regulation and protein functions. Therefore, the identification of S-sulfenylation sites is important to understand the functions of S-sulfenylated proteins. In this study, we proposed PredCSO, a computational method for predicting S-sulfenylation sites in proteins. PredCSO is built on four kinds of features, including position-specific scoring matrix, position-specific amino acid propensity, the absolute solvent accessibility and four-body statistical pseudo-potential. In particular, 21 crucial features were refined out using a two-step feature selection procedure consisting of a max-relevance algorithm and a sequential backward elimination algorithm. To overcome the problem of imbalanced sample sizes, we adopt an ensemble method, which combines bootstrap resampling, gradient tree boosting and majority voting. Our performance evaluation shows that PredCSO achieves state-of-the-art performance in identifying S-sulfenylation sites in proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids / chemistry
  • Computational Biology / methods*
  • Cysteine / analogs & derivatives
  • Cysteine / chemistry
  • Cysteine / metabolism
  • Models, Molecular
  • Position-Specific Scoring Matrices
  • Protein Conformation
  • Protein Processing, Post-Translational
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Software*
  • Sulfenic Acids / chemistry
  • Sulfenic Acids / metabolism

Substances

  • Amino Acids
  • Proteins
  • Sulfenic Acids
  • cysteinesulfenic acid
  • Cysteine