Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics

J Mol Biol. 2004 Apr 23;338(2):207-15. doi: 10.1016/j.jmb.2004.02.048.

Abstract

Diverse computational and experimental efforts are required to elucidate the control circuitry regulating the transcription of human genes. The fusion of gene-specific promoter analyses with large microarray studies and bioinformatics advances has produced optimism that significant progress can be made in unravelling this complex network. Within bioinformatics, past emphasis for improved pattern discovery has been placed upon "phylogenetic footprinting", the identification of sequences conserved over moderate periods of evolution (e.g. human and mouse comparisons). We introduce a new direction in bioinformatics based on the constraints imposed by the structures of DNA-binding proteins. For most structurally related families of transcription factors, there are clear similarities in the sequences of the sites to which they bind. On the basis of this observation, we construct familial binding profiles for well-characterized transcription factor families. The profiles are shown to classify correctly the structural class of mediating transcription factors for novel motifs in 88% of cases. By incorporating the familial profiles into pattern discovery procedures, we demonstrate that functional binding sites can be found in genomic sequences of dramatically greater length than is possible otherwise. Thus, incorporating familial models can overcome the signal-to-noise challenge that has hindered the transition from microarray data to regulatory control sequences for human genes. Biochemically motivated constraints upon sequence diversity of binding sites will complement the genetically motivated constraints imposed in "phylogenetic footprinting" algorithms.

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Binding Sites
  • Computational Biology*
  • Humans
  • Internet
  • Mice
  • Molecular Sequence Data
  • Protein Conformation*
  • Sequence Alignment
  • Transcription Factors / chemistry*
  • Transcription Factors / classification
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Transcription Factors