Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
Authors:
Anna Choromanska,
Benjamin Cowen,
Sadhana Kumaravel,
Ronny Luss,
Mattia Rigotti,
Irina Rish,
Brian Kingsbury,
Paolo DiAchille,
Viatcheslav Gurev,
Ravi Tejwani,
Djallel Bouneffouf
Abstract:
Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across la…
▽ More
Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.
△ Less
Submitted 5 June, 2019; v1 submitted 23 June, 2018;
originally announced June 2018.
LSALSA: Accelerated Source Separation via Learned Sparse Coding
Authors:
Benjamin Cowen,
Apoorva Nandini Saridena,
Anna Choromanska
Abstract:
We propose an efficient algorithm for the generalized sparse coding (SC) inference problem. The proposed framework applies to both the single dictionary setting, where each data point is represented as a sparse combination of the columns of one dictionary matrix, as well as the multiple dictionary setting as given in morphological component analysis (MCA), where the goal is to separate a signal in…
▽ More
We propose an efficient algorithm for the generalized sparse coding (SC) inference problem. The proposed framework applies to both the single dictionary setting, where each data point is represented as a sparse combination of the columns of one dictionary matrix, as well as the multiple dictionary setting as given in morphological component analysis (MCA), where the goal is to separate a signal into additive parts such that each part has distinct sparse representation within a corresponding dictionary. Both the SC task and its generalization via MCA have been cast as $\ell_1$-regularized least-squares optimization problems. To accelerate traditional acquisition of sparse codes, we propose a deep learning architecture that constitutes a trainable time-unfolded version of the Split Augmented Lagrangian Shrinkage Algorithm (SALSA), a special case of the Alternating Direction Method of Multipliers (ADMM). We empirically validate both variants of the algorithm, that we refer to as LSALSA (learned-SALSA), on image vision tasks and demonstrate that at inference our networks achieve vast improvements in terms of the running time, the quality of estimated sparse codes, and visual clarity on both classic SC and MCA problems. Finally, we present a theoretical framework for analyzing LSALSA network: we show that the proposed approach exactly implements a truncated ADMM applied to a new, learned cost function with curvature modified by one of the learned parameterized matrices. We extend a very recent Stochastic Alternating Optimization analysis framework to show that a gradient descent step along this learned loss landscape is equivalent to a modified gradient descent step along the original loss landscape. In this framework, the acceleration achieved by LSALSA could potentially be explained by the network's ability to learn a correction to the gradient direction of steeper descent.
△ Less
Submitted 7 June, 2019; v1 submitted 12 February, 2018;
originally announced February 2018.