Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility

BMC Bioinformatics. 2017 Jul 27;18(1):355. doi: 10.1186/s12859-017-1769-7.

Abstract

Background: Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events.

Results: We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types.

Conclusion: Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal "rules". A computational tool was developed to predict TF binding sites based on the universal "rules".

Keywords: Chromatin accessibility; Feature selection; Machine learning; Transcription factor binding prediction.

MeSH terms

  • Algorithms
  • Area Under Curve
  • Binding Sites
  • Cell Line, Tumor
  • Chromatin / chemistry
  • Chromatin / metabolism*
  • Chromatin Assembly and Disassembly
  • DNA / chemistry
  • DNA / metabolism
  • Humans
  • Models, Genetic*
  • Protein Binding
  • ROC Curve
  • Transcription Factors / chemistry
  • Transcription Factors / metabolism*

Substances

  • Chromatin
  • Transcription Factors
  • DNA