Reduced alphabet motif methodology for GPCR annotation

J Biomol Struct Dyn. 2007 Dec;25(3):299-310. doi: 10.1080/07391102.2007.10507178.

Abstract

Identification and Classification of G-protein coupled receptors (GPCRs) using protein sequences is an important computational challenge, given that experimental screening of thousands of ligands is an expensive proposition. There are two distinct but complementary approaches to GPCR classification --machine learning and sequence motif analysis. Machine learning methodologies typically suffer from problems of class imbalance and lack of multi-class classification. Many sequence motif methods, meanwhile, are too dependent on the similarity of the primary sequence alignments. It is desirable to have a motif discovery and application methodology that is not strongly dependent on primary sequence similarity. It should also overcome limitations of machine learning. We propose and evaluate the effectiveness of a simple methodology that uses a reduced protein functional alphabet representation, where similar functional residues have similar symbols. Regular expression motifs can then be obtained by ClustalW based multiple sequence alignment, using an identity matrix. Since evolutionary matrices like BLOSUM, PAM are not used, this method can be useful for any set of sequences that do not necessarily share a common ancestry. Reduced alphabet motifs can accurately classify known GPCR proteins and the results are comparable to PRINTS and PROSITE. For well known GPCR proteins from SWISSPROT, there were no false negatives and only a few false positives. This methodology covers most currently known classes of GPCRs, even if there are very few representative sequences. It also predicts more than one class for certain sequences, thus overcoming the limitation of machine learning methods. We also annotated, 695 orphan receptors, and 121 were identified as belonging to Family A. A simple JavaScript based web interface has been developed to predict GPCR families and subfamilies (www.insilico-consulting.com/gpcrmotif.html).

Publication types

  • Evaluation Study

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Computational Biology*
  • Molecular Sequence Data
  • Receptors, G-Protein-Coupled / chemistry*
  • Receptors, G-Protein-Coupled / classification*
  • Rhodopsin / chemistry
  • Sequence Alignment
  • Sequence Analysis, Protein*
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Receptors, G-Protein-Coupled
  • Rhodopsin