Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information

Bioinformatics. 2004 Mar 1;20(4):477-86. doi: 10.1093/bioinformatics/btg432. Epub 2004 Jan 22.

Abstract

Motivation: Though vitally important to cell function, the mechanism of protein-DNA binding has not yet been completely understood. We therefore analysed the relationship between DNA binding and protein sequence composition, solvent accessibility and secondary structure. Using non-redundant databases of transcription factors and protein-DNA complexes, neural network models were developed to utilize the information present in this relationship to predict DNA-binding proteins and their binding residues.

Results: Sequence composition was found to provide sufficient information to predict the probability of its binding to DNA with nearly 69% sensitivity at 64% accuracy for the considered proteins; sequence neighbourhood and solvent accessibility information were sufficient to make binding site predictions with 40% sensitivity at 79% accuracy. Detailed analysis of binding residues shows that some three- and five-residue segments frequently bind to DNA and that solvent accessibility plays a major role in binding. Although, binding behaviour was not associated with any particular secondary structure, there were interesting exceptions at the residue level. Over-representation of some residues in the binding sites was largely lost at the total sequence level, but a different kind of compositional preference was observed in DNA-binding proteins.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Artificial Intelligence*
  • Binding Sites
  • Computer Simulation
  • DNA / chemistry*
  • DNA-Binding Proteins / chemistry*
  • DNA-Binding Proteins / classification
  • Models, Chemical*
  • Models, Molecular
  • Molecular Sequence Data
  • Neural Networks, Computer*
  • Protein Binding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Structure-Activity Relationship
  • Transcription Factors / chemistry

Substances

  • Amino Acids
  • DNA-Binding Proteins
  • Transcription Factors
  • DNA