Functional inference from non-random distributions of conserved predicted transcription factor binding sites

Bioinformatics. 2004 Aug 4:20 Suppl 1:i109-15. doi: 10.1093/bioinformatics/bth908.

Abstract

Motivation: Our understanding of how genes are regulated in a concerted fashion is still limited. Especially, complex phenomena like cell cycle regulation in multicellular organisms are poorly understood. Therefore, we investigated conserved predicted transcription factor binding sites (TFBSs) in man-mouse upstream regions of genes that can be associated to a particular cell cycle phase in HeLa cells. TFBSs were predicted from selected binding site motifs (represented by position weight matrices, PWMs) based on a statistical approach. A regulatory role for a transcription factor is more probable if its predicted TFBSs are enriched in upstream regions of genes, that are associated with a subset of cell cycle phases. We tested for this association by computing exact P-values for the observed phase distributions under the null distribution defined by the relative amount of conserved upstream sequence of genes per cell cycle phase. We considered non-exonic and 5'-untranslated region (5'-UTR) binding sites separately and corrected for multiple testing by taking the false discovery rate into account.

Results: We identified 22 non-exonic and 11 5'-UTR significant PWM phase distributions although expecting one false discovery. Many of the corresponding transcription factors (e.g. members of the thyroid hormone/retinoid receptor subfamily) have already been associated with cell cycle regulation, proliferation and development. It appears that our method is a suitable tool for detecting putative cell cycle regulators in the realm of known human transcription factors.

Availability: Further details and supplementary data can be obtained from http://corg.molgen.mpg.de/cellcycle

MeSH terms

  • Animals
  • Binding Sites
  • Computer Simulation
  • Conserved Sequence / genetics*
  • Evolution, Molecular*
  • HeLa Cells
  • Humans
  • Mice
  • Models, Genetic*
  • Models, Statistical
  • Protein Binding
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Analysis, DNA / methods*
  • Statistical Distributions
  • Transcription Factors / genetics*

Substances

  • Transcription Factors