Mixture modeling for genome-wide localization of transcription factors

Biometrics. 2007 Mar;63(1):10-21. doi: 10.1111/j.1541-0420.2005.00659.x.

Abstract

Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein-DNA interaction (peak) and regions lacking this interaction. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. A novel feature of this model is the incorporation of a distribution for the peak size derived from the experimental design and parameters. This leads to the relaxation of the fixed peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. Simulation studies and a real data application demonstrate good operating characteristics of the method including high sensitivity with small sample sizes when compared to available alternative methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatin / genetics
  • DNA / genetics*
  • Genome*
  • Likelihood Functions
  • Models, Genetic
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods
  • RNA, Messenger / genetics
  • Transcription Factors / genetics*
  • Transcription, Genetic

Substances

  • Chromatin
  • RNA, Messenger
  • Transcription Factors
  • DNA