Biological systems are capable of learning that certain stimuli are valuable while ignoring the many that are not, and thus perform feature selection. In machine learning, one effective feature selection approach is the least absolute shrinkage and selection operator (LASSO) form of regularization, which is equivalent to assuming a Laplacian prior distribution on the parameters. We review how such Bayesian priors can be implemented in gradient descent as a form of weight decay, which is a biologically plausible mechanism for Bayesian feature selection. In particular, we describe a new prior that offsets or "raises" the Laplacian prior distribution. We evaluate this alongside the Gaussian and Cauchy priors in gradient descent using a generic regression task where there are few relevant and many irrelevant features. We find that raising the Laplacian leads to less prediction error because it is a better model of the underlying distribution. We also consider two biologically relevant online learning tasks, one synthetic and one modeled after the perceptual expertise task of Krigolson et al. (2009). Here, raising the Laplacian prior avoids the fast erosion of relevant parameters over the period following training because it only allows small weights to decay. This better matches the limited loss of association seen between days in the human data of the perceptual expertise task. Raising the Laplacian prior thus results in a biologically plausible form of Bayesian feature selection that is effective in biologically relevant contexts.
Keywords: Bayesian prior; Feature selection; LASSO; Online learning; Regularization; Weight decay.
Copyright © 2015 Elsevier Ltd. All rights reserved.