Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset

Bioinformatics. 2006 Jul 15;22(14):1737-44. doi: 10.1093/bioinformatics/btl184. Epub 2006 May 18.

Abstract

Motivation: Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements.

Results: We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters.

Availability: The open-source package gimm is available at http://eh3.uc.edu/gimm.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Bayes Theorem
  • Cluster Analysis*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Factual
  • Gene Expression Profiling / methods*
  • Models, Biological*
  • Multigene Family / physiology*
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated / methods*