Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines

J Chem Inf Model. 2016 Oct 24;56(10):2115-2122. doi: 10.1021/acs.jcim.6b00320. Epub 2016 Sep 22.

Abstract

Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Carbohydrate Metabolism*
  • Carbohydrates / chemistry
  • Databases, Protein
  • Humans
  • Molecular Docking Simulation
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism*
  • ROC Curve
  • Support Vector Machine*

Substances

  • Carbohydrates
  • Proteins