Fuzzy C-means method for clustering microarray data

Doulaye Dembélé; Philippe Kastner

doi:10.1093/bioinformatics/btg119

Fuzzy C-means method for clustering microarray data

Bioinformatics. 2003 May 22;19(8):973-80. doi: 10.1093/bioinformatics/btg119.

Authors

Doulaye Dembélé¹, Philippe Kastner

Affiliation

¹ Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS-IMSERM-ULP, BP 10142, 67404 Illkirch Cedex, France. [email protected]

PMID: 12761060
DOI: 10.1093/bioinformatics/btg119

Abstract

Motivation: Clustering analysis of data from DNA microarray hybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes.

Results: A major problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m. We show that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m. By setting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster.

Availability: Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/

Publication types

Comparative Study
Evaluation Study
Validation Study

MeSH terms

Algorithms
Cluster Analysis*
Databases, Genetic
Fuzzy Logic*
Gene Expression Profiling / methods*
Gene Expression Regulation / genetics
Humans
Neoplasms / genetics
Oligonucleotide Array Sequence Analysis / methods*
Quality Control
Sequence Analysis, DNA / methods*
Tumor Cells, Cultured
Yeasts / genetics