Predicting enzyme function from sequence: a systematic appraisal

Proc Int Conf Intell Syst Mol Biol. 1997:5:276-83.

Abstract

Gapped and ungapped sequence alignment were tested as possible methods to classify proteins into the functional classes defined by the International Enzyme Commission (EC). We exhaustively tested all 15,208 proteins labeled with any EC class in a recent release of the SwissProt database, evaluating all 1,327 relevant EC classes. We effectively tested all possible similarity thresholds that could be used for this assignment through the use of the ROC statistic. Approximately 60% of Enzyme Commission classes containing two or more proteins could not be perfectly discriminated by sequence similarity at any threshold. An analysis of the errors indicates that false positive matches dominate, and that various error mechanisms can be identified, including the multidomain nature of many proteins and polyproteins, convergent evolution, variation in enzyme specificity, and other factors. Many of the putatively false positives are in fact biologically relevant. This work strongly suggests that functional assignment of enzymes should attempt to delimit functionally significant subregions, or domains, before matching to EC classes.

Publication types

  • Comparative Study

MeSH terms

  • Databases, Factual
  • Enzymes / classification
  • Enzymes / genetics*
  • Enzymes / physiology*
  • Evaluation Studies as Topic
  • ROC Curve
  • Sequence Alignment / methods*
  • Sequence Alignment / statistics & numerical data
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Enzymes