Identification of human gene core promoters in silico

Genome Res. 1998 Mar;8(3):319-26. doi: 10.1101/gr.8.3.319.

Abstract

Identification of the 5'-end of human genes requires identification of functional promoter elements. In silico identification of those elements is difficult because of the hierarchical and modular nature of promoter architecture. To address this problem, I propose a new stepwise strategy based on initial localization of a functional promoter into a 1- to 2-kb (extended promoter) region from within a large genomic DNA sequence of 100 kb or larger and further localization of a transcriptional start site (TSS) into a 50- to 100-bp (corepromoter) region. Using positional dependent 5-tuple measures, a quadratic discriminant analysis (QDA) method has been implemented in a new program-CorePromoter. Our experiments indicate that when given a 1- to 2-kb extended promoter, CorePromoter will correctly localize the TSS to a 100-bp interval approximately 60% of the time. [Figure 3 can be found in its entirety as an online supplement at http://www.genome.org.]

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Analysis of Variance
  • Computational Biology
  • Genome, Human*
  • Humans
  • Promoter Regions, Genetic / genetics*
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data
  • Software