Biclustering microarray data by Gibbs sampling

Bioinformatics. 2003 Oct:19 Suppl 2:ii196-205. doi: 10.1093/bioinformatics/btg1078.

Abstract

Motivation: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data.

Results: In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution. We have opted for a simple probabilistic model of the biclusters because it has the key advantage of providing a transparent probabilistic interpretation of the biclusters in the form of an easily interpretable fingerprint. Furthermore, Gibbs sampling does not suffer from the problem of local minima that often characterizes Expectation-Maximization. We demonstrate the effectiveness of our approach on two synthetic data sets as well as a data set from leukemia patients.

MeSH terms

  • Biomarkers, Tumor / genetics*
  • Cluster Analysis*
  • Data Interpretation, Statistical
  • Gene Expression Profiling / methods*
  • Genetic Predisposition to Disease / genetics
  • Humans
  • Leukemia / diagnosis*
  • Leukemia / genetics*
  • Neoplasm Proteins / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins