Prediction of protein accessible surface areas by support vector regression

Proteins. 2004 Nov 15;57(3):558-64. doi: 10.1002/prot.20234.

Abstract

A novel support vector regression (SVR) approach is proposed to predict protein accessible surface areas (ASAs) from their primary structures. In this work, we predict the real values of ASA in squared angstroms for residues instead of relative solvent accessibility. Based on protein residues, the mean and median absolute errors are 26.0 A(2) and 18.87 A(2), respectively. The correlation coefficient between the predicted and observed ASAs is 0.66. Cysteine is the best predicted amino acid (mean absolute error is 13.8 A(2) and median absolute error is 8.37 A(2)), while arginine is the least predicted amino acid (mean absolute error is 42.7 A(2) and median absolute error is 36.31 A(2)). Our work suggests that the SVR approach can be directly applied to the ASA prediction where data preclassification has been used.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arginine / metabolism
  • Binding Sites
  • Computational Biology / methods*
  • Databases, Protein
  • Protein Binding
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Solvents / chemistry
  • Solvents / metabolism

Substances

  • Proteins
  • Solvents
  • Arginine