Weighted enrichment method for prediction of transcription regulators from transcriptome and global chromatin immunoprecipitation data

Nucleic Acids Res. 2016 Jun 20;44(11):5010-21. doi: 10.1093/nar/gkw355. Epub 2016 Apr 30.

Abstract

Predicting responsible transcription regulators on the basis of transcriptome data is one of the most promising computational approaches to understanding cellular processes and characteristics. Here, we present a novel method employing vast amounts of chromatin immunoprecipitation (ChIP) experimental data to address this issue. Global high-throughput ChIP data was collected to construct a comprehensive database, containing 8 578 738 binding interactions of 454 transcription regulators. To incorporate information about heterogeneous frequencies of transcription factor (TF)-binding events, we developed a flexible framework for gene set analysis employing the weighted t-test procedure, namely weighted parametric gene set analysis (wPGSA). Using transcriptome data as an input, wPGSA predicts the activities of transcription regulators responsible for observed gene expression. Validation of wPGSA with published transcriptome data, including that from over-expressed TFs, showed that the method can predict activities of various TFs, regardless of cell type and conditions, with results totally consistent with biological observations. We also applied wPGSA to other published transcriptome data and identified potential key regulators of cell reprogramming and influenza virus pathogenesis, generating compelling hypotheses regarding underlying regulatory mechanisms. This flexible framework will contribute to uncovering the dynamic and robust architectures of biological regulation, by incorporating high-throughput experimental data in the form of weights.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Binding Sites*
  • Chromatin Immunoprecipitation*
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Mice
  • Protein Binding
  • Reproducibility of Results
  • Transcription Factors / metabolism*
  • Transcriptome*

Substances

  • Transcription Factors