Structural alignment of protein--DNA interfaces: insights into the determinants of binding specificity

J Mol Biol. 2005 Feb 4;345(5):1027-45. doi: 10.1016/j.jmb.2004.11.010. Epub 2004 Dec 8.

Abstract

A new method is introduced to structurally align interfaces observed in protein--DNA complexes. The method is based on a procedure that describes the interfacial geometry in terms of the spatial relationships between individual amino acid--nucleotide pairs. An amino acid--amino acid similarity matrix, S, is defined that provides a quantitative measure of the geometric relationships of amino acids in different interfaces and the entire stretch of "local" DNA within some distance of each amino acid. S is used as a substitution matrix in a dynamic programming algorithm that aligns the interfacial amino acids of the two complexes. The quality of the alignment is determined by an interface alignment score, IAS, that provides a quantitative measure of the similarity in the docking geometry between two protein--DNA complexes. We have clustered a large set of protein--DNA complexes based on their IAS values. In general, proteins within a single family form identifiable clusters. Subgroup clustering is often observed within families offering a fine-grained description of docking geometries. Although proteins with similar folds tend to dock in similar ways, important differences are observed even for structural motifs that almost perfectly align. Relationships are observed between the interfaces formed in cognate and non-cognate complexes involving the same proteins indicating a strong driving force to maintain certain contacts, even if this requires a distortion of the DNA. There are cases where inter-family similarities are greater than intra-family similarities. Our method offers the possibility of comparing different protein--DNA interfaces in a detailed, objective and quantitative fashion. This offers the possibility of new approaches to the description of the determinants of molecular recognition and to the prediction of protein and DNA sequence combinations that are optimal for binding.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Binding Sites
  • DNA / chemistry*
  • DNA / metabolism*
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Binding
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Proteins / metabolism*
  • Sequence Alignment
  • Structure-Activity Relationship
  • Substrate Specificity

Substances

  • Proteins
  • DNA