Evolutionary information for specifying a protein fold

Nature. 2005 Sep 22;437(7058):512-8. doi: 10.1038/nature03991.

Abstract

Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology*
  • Evolution, Molecular*
  • Magnetic Resonance Spectroscopy
  • Models, Molecular
  • Protein Denaturation
  • Protein Folding*
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Sequence Alignment
  • Thermodynamics

Substances

  • Proteins

Associated data

  • PDB/1YMZ