Clustered coefficient regression (CCR) extends the classical regression model by allowing regression coefficients varying across observations and forming clusters of observations. It has become an increasingly useful tool for modeling the heterogeneous relationship between the predictor and response variables. A typical issue of existing CCR methods is that the estimation and clustering results can be unstable in the presence of multicollinearity. To address the instability issue, this paper introduces a low-rank structure of the CCR coefficient matrix and proposes a penalized non-convex optimization problem with an adaptive group fusion-type penalty tailor-made for this structure. An iterative algorithm is developed to solve this non-convex optimization problem with guaranteed convergence. An upper bound for the coefficient estimation error is also obtained to show the statistical property of the estimator. Empirical studies on both simulated datasets and a COVID-19 mortality rate dataset demonstrate the superiority of the proposed method to existing methods.
Keywords: data heterogeneity; local multicollinearity; low-rank structure; supervised dimensionality reduction.
© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.