The proximal promoter sequences contain basic motifs for the expression of the downstream genes. We present genome-scale computational analyses of the 120-bp immediate upstream sequences to the +1 transcription start sites (TSSs) of 10,117 human protein-coding genes, and unravel exceptional genes in respect with the core promoter nucleotide composition. Our data reveal that while in 99% of the genes the absolute purine/pyrimidine ratio ranges between 0.2 and 2.5, certain genes show exceptional skew in this balance (e.g. ratios of 82.3 in VWA3A, 61.5 in Sox5, and 24.0 in BRWD3), and consist of islands of purines or pyrimidines. Furthermore, while over 95% of the genes lack more than one short tandem repeat (STR) in their core promoters, certain gene promoters are exceptionally rich in multiple STRs (e.g. eight consecutive STRs in UBE2QL1, and six STRs in GRIA2). We found sequence bias for the majority of those promoters across species, supporting functional roles for them in gene expression. Genes downstream to those promoters were also found to be of ontologic importance (i.e. we were able to track the majority of those genes to the lower species such as Saccharomyces cerevisiae and Caenorhabditis elegans). The exceptional promoters presented in this study lack the conventional motifs for the TATA, and TATA-less promoters, hence offering novel mechanisms for gene expression. They may also provide potential mechanisms for inter-individual variations in gene expression, and complex traits/disorders.
Copyright © 2011 Elsevier B.V. All rights reserved.