iFORM: Incorporating Find Occurrence of Regulatory Motifs

PLoS One. 2016 Dec 19;11(12):e0168607. doi: 10.1371/journal.pone.0168607. eCollection 2016.

Abstract

Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.

MeSH terms

  • Gene Expression Regulation / genetics*
  • Humans
  • Nucleotide Motifs*
  • Response Elements / genetics*
  • Sequence Analysis, DNA / methods*
  • Software*
  • Transcription Factors / genetics*

Substances

  • Transcription Factors

Grants and funding

This work was supported by grants from the Major Research plan of the National Natural Science Foundation of China (No. U1435222), the Program of International S&T Cooperation (No. 2014DFB30020) and the National High Technology Research and Development Program of China (No. 2015AA020108). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.