Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development

J Comput Aided Mol Des. 2009 Nov;23(11):773-84. doi: 10.1007/s10822-009-9273-4. Epub 2009 Jun 20.

Abstract

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Computational Biology
  • Databases, Protein
  • Models, Molecular*
  • Models, Statistical*
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / metabolism*
  • Sensitivity and Specificity

Substances

  • Proteins