Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs

PLoS One. 2011;6(7):e22270. doi: 10.1371/journal.pone.0022270. Epub 2011 Jul 20.

Abstract

The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Animals
  • Computational Biology / methods*
  • Conserved Sequence
  • Data Mining / methods*
  • Databases, Protein*
  • Evolution, Molecular
  • Humans
  • Internet
  • Mice
  • Molecular Sequence Annotation
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism*
  • Rats
  • User-Computer Interface

Substances

  • Proteins