How to avoid spurious cluster validation? A methodological investigation on simulated and fMRI data

Neuroimage. 2002 Sep;17(1):431-46. doi: 10.1006/nimg.2002.1166.

Abstract

This paper presents an evaluation of a common approach that has been considered as a promising option for exploratory fMRI data analyses. The approach includes two stages: creating from the data a sequence of partitions with increasing number of subsets (clustering) and selecting the one partition in this sequence that exhibits the clearest indications of an existing structure (cluster validation). In order to achieve that the selected partition is actually the best characterization of the data structure, previous studies were directed to find the most appropriate validity function(s). In our analysis protocol, we first optimize the sequence of partitions according to the given objective function. Our study showed that an insufficient optimization of the partition, for one or more numbers of clusters, can easily yield a spurious validation result which, in turn, may lead the analyst to a misleading interpretation of the fMRI experiment. However, a sufficient optimization, for each included number of clusters, provided the basis for a reliable, adequate characterization of the data Furthermore, it enabled an adequate evaluation of the validity functions. These findings were obtained independently for three clustering algorithms (representing the hard and fuzzy clustering variant) and three up-to-date cluster validity functions. The findings were derived from analyses of Gaussian clusters, simulated data sets that mimic typical fMRI response signals, andreal fMRI data. Based on our results we propose a number of options of how to configure improved clustering tools.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Brain / anatomy & histology
  • Brain / physiology
  • Cluster Analysis*
  • Computer Simulation
  • Fingers / innervation
  • Fingers / physiology
  • Humans
  • Magnetic Resonance Imaging / statistics & numerical data*
  • Psychomotor Performance / physiology
  • Reproducibility of Results