Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data

Genome Res. 2011 Mar;21(3):447-55. doi: 10.1101/gr.112623.110. Epub 2010 Nov 24.

Abstract

Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites corresponding to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNase I cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell types or tissues under diverse conditions.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites / genetics*
  • Cells, Cultured
  • Chromatin / genetics
  • Chromatin / metabolism*
  • Chromatin Immunoprecipitation
  • Computational Biology
  • DNA Cleavage
  • Genome
  • Histones / metabolism
  • Humans
  • Molecular Sequence Annotation
  • Oligonucleotide Array Sequence Analysis / methods*
  • Position-Specific Scoring Matrices
  • Protein Binding
  • Regulatory Sequences, Nucleic Acid
  • Sequence Analysis, DNA*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*
  • Transcription, Genetic

Substances

  • Chromatin
  • Histones
  • Transcription Factors