Predicting CTCF cell type active binding sites in human genome

Sci Rep. 2024 Dec 30;14(1):31744. doi: 10.1038/s41598-024-82238-5.

Abstract

The CCCTC-binding factor (CTCF) is pivotal in orchestrating diverse biological functions across the human genome, yet the mechanisms driving its cell type-active DNA binding affinity remain underexplored. Here, we collected ChIP-seq data from 67 cell lines in ENCODE, constructed a unique dataset of cell type-active CTCF binding sites (CBS), and trained convolutional neural networks (CNN) to dissect the patterns of CTCF binding activity. Our analysis reveals that transcription factors RAD21/SMC3 and chromatin accessibility are more predictive compared to sequence motifs and histone modifications. Integrating them together achieved AUPRC values consistently above 0.868, highlighting their utility in deciphering CTCF transcription factor binding dynamics. This study provides a deeper understanding of the regulatory functions of CTCF via machine learning framework.

Keywords: CTCF binding site; Chromatin accessibility; Convolutional neural networks; RAD21; SMC3.

MeSH terms

  • Binding Sites
  • CCCTC-Binding Factor* / genetics
  • CCCTC-Binding Factor* / metabolism
  • Cell Cycle Proteins / genetics
  • Cell Cycle Proteins / metabolism
  • Cell Line
  • Chromatin / genetics
  • Chromatin / metabolism
  • Chromatin Immunoprecipitation Sequencing
  • DNA-Binding Proteins / genetics
  • DNA-Binding Proteins / metabolism
  • Genome, Human*
  • Humans
  • Neural Networks, Computer
  • Protein Binding

Substances

  • CCCTC-Binding Factor
  • CTCF protein, human
  • Chromatin
  • Cell Cycle Proteins
  • RAD21 protein, human
  • DNA-Binding Proteins