Correlated rigid modes in protein families

Phys Biol. 2016 Apr 11;13(2):025003. doi: 10.1088/1478-3975/13/2/025003.

Abstract

A great deal of evolutionarily conserved information is contained in genomes and proteins. Enormous effort has been put into understanding protein structure and developing computational tools for protein folding, and many sophisticated approaches take structure and sequence homology into account. Several groups have applied statistical physics approaches to extracting information about proteins from sequences alone. Here, we develop a new method for sequence analysis based on first principles, in information theory, in statistical physics and in Bayesian analysis. We provide a complete derivation of our approach and we apply it to a variety of systems, to demonstrate its utility and its limitations. We show in some examples that phylogenetic alignments of amino-acid sequences of families of proteins imply the existence of a small number of modes that appear to be associated with correlated global variation. These modes are uncovered efficiently in our approach by computing a non-perturbative effective potential directly from the alignment. We show that this effective potential approaches a limiting form inversely with the logarithm of the number of sequences. Mapping symbol entropy flows along modes to underlying physical structures shows that these modes arise due to correlated compensatory adjustments. In the protein examples, these occur around functional binding pockets.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem
  • Entropy
  • Humans
  • Models, Molecular
  • Phylogeny
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins