Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships

J Comput Biol. 2003;10(6):857-68. doi: 10.1089/106652703322756113.

Abstract

One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classification algorithm known as the support vector machine (SVM), provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better performance than SVM-Fisher, profile HMMs, and PSI-BLAST.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Evolution, Molecular*
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Alignment / methods*
  • Structural Homology, Protein*

Substances

  • Proteins