EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences

Genomics. 2008 Mar;91(3):259-66. doi: 10.1016/j.ygeno.2007.11.001.

Abstract

Although several computational methods have been developed to identify transcription start sites (TSSs)/promoters, the computational prediction still needs improvement. Due to low performance, the promoter prediction programs can provide misleading results in functional genomic studies. To improve the prediction accuracy, we propose the use of an ensemble approach, EnsemPro (Ensemble Promoter), which combines the prediction results of the existing promoter predictors. We schematically compared the prediction performance of the currently available promoter prediction programs in an identical evaluating environment, and the results served as a guide for choosing the combined predictors. We applied three representative ensemble schemes-the majority voting, the weighted voting, and the Bayesian approach-for the TSS prediction of hundreds of human genomic sequences. EnsemPro identified the TSSs more precisely than other combining methods as well as the currently available individual predictor programs. The source code of EnsemPro is available on request from the authors.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Binding Sites / genetics
  • Computational Biology*
  • DNA / genetics*
  • Genome, Human
  • Humans
  • Neural Networks, Computer
  • Promoter Regions, Genetic*
  • Software
  • Transcription Initiation Site*

Substances

  • DNA