Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic

Proteins. 2002 Sep 1;48(4):682-95. doi: 10.1002/prot.10168.

Abstract

We investigated protein motions using normal modes within a database framework, determining on a large sample the degree to which normal modes anticipate the direction of the observed motion and were useful for motions classification. As a starting point for our analysis, we identified a large number of examples of protein flexibility from a comprehensive set of structural alignments of the proteins in the PDB. Each example consisted of a pair of proteins that were considerably different in structure given their sequence similarity. On each pair, we performed geometric comparisons and adiabatic-mapping interpolations in a high-throughput pipeline, arriving at a final list of 3,814 putative motions and standardized statistics for each. We then computed the normal modes of each motion in this list, determining the linear combination of modes that best approximated the direction of the observed motion. We integrated our new motions and normal mode calculations in the Macromolecular Motions Database, through a new ranking interface at http://molmovdb.org. Based on the normal mode calculations and the interpolations, we identified a new statistic, mode concentration, related to the mathematical concept of information content, which describes the degree to which the direction of the observed motion can be summarized by a few modes. Using this statistic, we were able to determine the fraction of the 3,814 motions where one could anticipate the direction of the actual motion from only a few modes. We also investigated mode concentration in comparison to related statistics on combinations of normal modes and correlated it with quantities characterizing protein flexibility (e.g., maximum backbone displacement or number of mobile atoms). Finally, we evaluated the ability of mode concentration to automatically classify motions into a variety of simple categories (e.g., whether or not they are "fragment-like"), in comparison to motion statistics. This involved the application of decision trees and feature selection (particular machine-learning techniques) to training and testing sets derived from merging the "list" of motions with manually classified ones.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Databases, Protein*
  • Internet
  • Models, Molecular
  • Models, Statistical*
  • Molecular Structure
  • Motion
  • Peptide Fragments / chemistry
  • Protein Structure, Tertiary
  • Protein Subunits
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sequence Analysis, Protein / methods*

Substances

  • Peptide Fragments
  • Protein Subunits
  • Proteins