Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty

J Mach Learn Res. 2013 Jul 1;14(7):1865.

Abstract

Clustering analysis is widely used in many fields. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classification and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classification and regression, such as model selection criteria to select the number of clusters, a difficult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method's promising performance.

Keywords: Generalized degrees of freedom; Grouping; K-means clustering; Lasso; Penalized regression; Truncated lasso penalty (TLP).