A robust and efficient variable selection method for linear regression

J Appl Stat. 2021 Aug 6;49(14):3677-3692. doi: 10.1080/02664763.2021.1962259. eCollection 2022.

Abstract

Variable selection is fundamental to high dimensional statistical modeling, and many approaches have been proposed. However, existing variable selection methods do not perform well in presence of outliers in response variable or/and covariates. In order to ensure a high probability of correct selection and efficient parameter estimation, we investigate a robust variable selection method based on a modified Huber's function with an exponential squared loss tail. We also prove that the proposed method has oracle properties. Furthermore, we carry out simulation studies to evaluate the performance of the proposed method for both p<n and p>n. Our simulation results indicate that the proposed method is efficient and robust against outliers and heavy-tailed distributions. Finally, a real dataset from an air pollution mortality study is used to illustrate the proposed method.

Keywords: 62J05; 62J07; Oracle properties; penalty function; robustness; variable selection.

Grants and funding

This research was supported by the National Natural Science Foundation of China (No. 11871390), Australian Research Council Discovery Project (DP160104292), the Fundamental Research Funds for the Central Universities (No. xjj2017180), the Natural Science Basic Research Plan in ShaanxiProvince of China (No. 2018JQ1006) and the Natural Science Foundation of Guangdong (Nos. 2018A030313171, 2019A1515011830).