Identification of divergent functions in homologous proteins by induction over conserved modules

Proc Int Conf Intell Syst Mol Biol. 1998:6:157-64.

Abstract

Homologous proteins do not necessarily exhibit identical biochemical function. Despite this fact, local or global sequence similarity is widely used as an indication of functional identity. Of the 1327 Enzyme Commission defined functional classes with more than one annotated example in the sequence databases, similarity scores alone are inadequate in 251 (19%) of the cases. We test the hypothesis that conserved domains, as defined in the ProDom database, can be used to discriminate between alternative functions for homologous proteins in these cases. Using machine learning methods, we were able to induce correct discriminators for more than half of these 251 challenging functional classes. These results show that the combination of modular representations of proteins with sequence similarity improves the ability to infer function from sequence over similarity scores alone.

Publication types

  • Comparative Study

MeSH terms

  • Alcohol Dehydrogenase / chemistry
  • Alcohol Dehydrogenase / genetics
  • Alcohol Dehydrogenase / physiology
  • Artificial Intelligence
  • Bayes Theorem
  • Conserved Sequence
  • Databases, Factual
  • Evolution, Molecular
  • L-Lactate Dehydrogenase / chemistry
  • L-Lactate Dehydrogenase / genetics
  • L-Lactate Dehydrogenase / physiology
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / physiology*
  • Sequence Homology, Amino Acid
  • Tetrahydrofolate Dehydrogenase / chemistry
  • Tetrahydrofolate Dehydrogenase / genetics
  • Tetrahydrofolate Dehydrogenase / physiology

Substances

  • Proteins
  • Alcohol Dehydrogenase
  • L-Lactate Dehydrogenase
  • Tetrahydrofolate Dehydrogenase