A synergistic DNA logic predicts genome-wide chromatin accessibility

Genome Res. 2016 Oct;26(10):1430-1440. doi: 10.1101/gr.199778.115. Epub 2016 Jul 25.

Abstract

Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Chromatin / genetics*
  • Chromatin / metabolism
  • Chromatin Assembly and Disassembly*
  • Genome, Human
  • Humans
  • Machine Learning
  • Models, Genetic*

Substances

  • Chromatin