Algorithms for Sparse Support Vector Machines

Alfonso Landeros; Kenneth Lange

doi:10.1080/10618600.2022.2146697

Algorithms for Sparse Support Vector Machines

J Comput Graph Stat. 2023;32(3):1097-1108. doi: 10.1080/10618600.2022.2146697. Epub 2022 Dec 13.

Authors

Alfonso Landeros¹, Kenneth Lange^{1

2

3}

Affiliations

¹ Departments of Computational Medicine, University of California, Los Angeles.
² Departments of Human Genetics, University of California, Los Angeles.
³ Departments of Statistics, University of California, Los Angeles.

Abstract

Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by $ℓ_{1}$ penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current paper presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function $L (β)$ and adds the penalty $\frac{ρ}{2} dist {(β, S_{k})}^{2}$ capturing the squared Euclidean distance of the parameter vector $β$ to the sparsity set $S_{k}$ where at most k components of $β$ are nonzero. If $β_{ρ}$ represents the minimum of the objective $f_{ρ} (β) = L (β) + \frac{ρ}{2} dist {(β, S_{k})}^{2}$ , then $β_{ρ}$ tends to the constrained minimum of $L (β)$ over $S_{k}$ as $ρ$ tends to $\infty$ . We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power.

Keywords: Julia; discriminant analysis; sparsity; unsupervised learning.

Abstract

Grants and funding