Broad-Enrich: functional interpretation of large sets of broad genomic regions

Bioinformatics. 2014 Sep 1;30(17):i393-400. doi: 10.1093/bioinformatics/btu444.

Abstract

Motivation: Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes.

Results: To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations.

Availability and implementation: http://broad-enrich.med.umich.edu for Web version and R package.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cell Line
  • Chromatin Immunoprecipitation / methods*
  • Genetic Loci
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Histones / metabolism*
  • Humans
  • Logistic Models
  • Sequence Analysis, DNA
  • Transcription Factors / metabolism

Substances

  • Histones
  • Transcription Factors