Choosing the number of clusters in Κ-means clustering

Psychol Methods. 2011 Sep;16(3):285-97. doi: 10.1037/a0023346.

Abstract

Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the number of clusters. Through a series of simulations, the proposed methodology is shown to outperform several other commonly used procedures for determining both the presence of clusters and their number.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cluster Analysis*
  • Data Interpretation, Statistical
  • Humans
  • Models, Statistical
  • Psychology / methods
  • Reproducibility of Results