Position dominant sequence elements in experimentally verified human promoters and their putative relation to cancer

Cancer Genomics Proteomics. 2009 Nov-Dec;6(6):337-55.

Abstract

Promoter regions of the human genome play a key role in our understanding of the regulatory mechanisms related to the physiological and disease states. The aim of this study was to investigate the sequence positional properties of experimentally verified human promoters. Consequently, we determined short sequence elements ranging from 4 to 9mers presenting position dominance close to, or away from the transcription start site (TSS). For this purpose rigid statistical criteria were used and whether position dominance was in any way related to transcription control was determined. To achieve this goal we designed and implemented a dedicated filtering method to massively detect position-dominant sequence elements embedded in the promoter set. Additionally, via a high throughput procedure, we gathered data on the majority of the publicly available transcription factor-binding sites (TFBSs) and matched them to our findings, aiming to accomplish a large-scale correlation between position-dominant sequence elements and TFBSs. In this analysis, we present unique compositional and conservational perturbations at the TSS and the core promoter region. Using our filtering method, 7,088 short sequences ranging from 4 to 9mers were found to present strong positional dominance close to or away from the TSS, while the aforementioned short sequences were matched to a large number of known TFBSs. Moreover, using probability theory, evidence is presented showing that TFBSs tend to present strong positional preferences. In addition, we demonstrate that the actual TFBS copy number is related to the transcription regulatory process. On the basis of the last argument, it is suggested that all the detected short sequences which did not match any known TFBS, have a high potential for being novel transcription control elements. Furthermore, using a well-described ;high potential cancer biomarker resource', we attempted to identify position dominant sequence elements associated with cancer, as derived by their presence in the respective promoters of cancer related proteins.

MeSH terms

  • Binding Sites
  • Biomarkers, Tumor / genetics
  • Conserved Sequence
  • Humans
  • Molecular Sequence Data
  • Neoplasms / genetics*
  • Neoplasms / metabolism
  • Promoter Regions, Genetic*
  • Sequence Analysis, DNA
  • Transcription Factors / metabolism

Substances

  • Biomarkers, Tumor
  • Transcription Factors