Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein

FEBS Lett. 2004 Oct 22;576(3):348-52. doi: 10.1016/j.febslet.2004.09.036.

Abstract

Intrinsically disordered proteins are an important class of proteins with unique functions and properties. Here, we have applied a support vector machine (SVM) trained on naturally occurring disordered and ordered proteins to examine the contribution of various parameters (vectors) to recognizing proteins that contain disordered regions. We find that a SVM that incorporates only amino acid composition has a recognition accuracy of 87+/-2%. This result suggests that composition alone is sufficient to accurately recognize disorder. Interestingly, SVMs using reduced sets of amino acids based on chemical similarity preserve high recognition accuracy. A set as small as four retains an accuracy of 84+/-2%; this suggests that general physicochemical properties rather than specific amino acids are important factors contributing to protein disorder.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry*
  • Amino Acids / genetics
  • Data Interpretation, Statistical
  • Models, Theoretical
  • Proteins / chemical synthesis
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Reproducibility of Results
  • Software

Substances

  • Amino Acids
  • Proteins