A composite method based on formal grammar and DNA structural features in detecting human polymerase II promoter region

PLoS One. 2013;8(2):e54843. doi: 10.1371/journal.pone.0054843. Epub 2013 Feb 20.

Abstract

An important step in understanding gene regulation is to identify the promoter regions where the transcription factor binding takes place. Predicting a promoter region de novo has been a theoretical goal for many researchers for a long time. There exists a number of in silico methods to predict the promoter region de novo but most of these methods are still suffering from various shortcomings, a major one being the selection of appropriate features of promoter region distinguishing them from non-promoters. In this communication, we have proposed a new composite method that predicts promoter sequences based on the interrelationship between structural profiles of DNA and primary sequence elements of the promoter regions. We have shown that a Context Free Grammar (CFG) can formalize the relationships between different primary sequence features and by utilizing the CFG, we demonstrate that an efficient parser can be constructed for extracting these relationships from DNA sequences to distinguish the true promoter sequences from non-promoter sequences. Along with CFG, we have extracted the structural features of the promoter region to improve upon the efficiency of our prediction system. Extensive experiments performed on different datasets reveals that our method is effective in predicting promoter sequences on a genome-wide scale and performs satisfactorily as compared to other promoter prediction techniques.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • DNA / chemistry*
  • DNA / genetics*
  • Databases, Nucleic Acid
  • Gene Expression Regulation
  • Genome, Human / genetics
  • Humans
  • Promoter Regions, Genetic*
  • RNA Polymerase II / genetics*
  • Sequence Analysis, DNA / methods*
  • Transcription Initiation Site

Substances

  • DNA
  • RNA Polymerase II

Grants and funding

Financial support was received from the Department of Biotechnology, Government of India, grant no. BT/BI/04/001/93 and BT/BI/10/019/99 (http://dbtindia.nic.in). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.