Combination of several bioinformatics approaches for the identification of new putative glycosyltransferases in Arabidopsis

J Proteome Res. 2009 Feb;8(2):743-53. doi: 10.1021/pr800808m.

Abstract

Approximately 450 glycosyltransferase (GT) sequences have been already identified in the Arabidopsis genome that organize into 40 sequence-based families, but a vast majority of these gene products remain biochemically uncharacterized open reading frames. Given the complexity of the cell wall carbohydrate network, it can be inferred that some of the biosynthetic genes have not yet been identified by classical bioinformatics approaches. With the objective to identify new plant GT genes, we designed a bioinformatic strategy that is based on the use of several remote homology detection methods that act at the 1D, 2D, and 3D level. Together, these methods led to the identification of more than 150 candidate protein sequences. Among them, 20 are considered as putative glycosyltransferases that should further be investigated since known GT signatures were clearly identified.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Arabidopsis / enzymology*
  • Arabidopsis / genetics
  • Arabidopsis Proteins / chemistry
  • Arabidopsis Proteins / classification
  • Arabidopsis Proteins / genetics*
  • Computational Biology / methods*
  • Glycosyltransferases / chemistry
  • Glycosyltransferases / classification
  • Glycosyltransferases / genetics*
  • Molecular Sequence Data
  • Phylogeny
  • Sequence Alignment

Substances

  • Arabidopsis Proteins
  • Glycosyltransferases