Mixture modeling for genome-wide localization of transcription factors

Sündüz Keleş

doi:10.1111/j.1541-0420.2005.00659.x

Mixture modeling for genome-wide localization of transcription factors

Biometrics. 2007 Mar;63(1):10-21. doi: 10.1111/j.1541-0420.2005.00659.x.

Author

Sündüz Keleş¹

Affiliation

¹ Department of Statistics and Department of Biostatistics and Medical Informatics, 1300 University Avenue, 1245B Medical Sciences Center, Madison, Wisconsin 53706, USA. [email protected]

PMID: 17447925
DOI: 10.1111/j.1541-0420.2005.00659.x

Abstract

Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein-DNA interaction (peak) and regions lacking this interaction. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. A novel feature of this model is the incorporation of a distribution for the peak size derived from the experimental design and parameters. This leads to the relaxation of the fixed peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. Simulation studies and a real data application demonstrate good operating characteristics of the method including high sensitivity with small sample sizes when compared to available alternative methods.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Chromatin / genetics
DNA / genetics*
Genome*
Likelihood Functions
Models, Genetic
Models, Statistical
Oligonucleotide Array Sequence Analysis / methods
RNA, Messenger / genetics
Transcription Factors / genetics*
Transcription, Genetic

Substances

Chromatin
RNA, Messenger
Transcription Factors
DNA