Clipping on learning rates in Adam leads to an effective stochastic algorithm-AdaBound. In spite of its effectiveness in practice, convergence analysis of AdaBound has not been fully explored, especially for non-convex optimization. To this end, we address the convergence of the last individual output of AdaBound for non-convex stochastic optimization problems, which is called individual convergence. We prove that, with the iteration of the AdaBound, the cost function converges to a finite value and the corresponding gradient converges to zero. The novelty of this proof is that the convergence conditions on the bound functions and momentum factors are much more relaxed than the existing results, especially when we remove the monotonicity and convergence of the bound functions, and only keep their boundedness. The momentum factors can be fixed to be constant, without the restriction of monotonically decreasing. This provides a new perspective on understanding the bound functions and momentum factors of AdaBound. At last, numerical experiments are provided to corroborate our theory and show that the convergence of AdaBound extends to more general bound functions.
Keywords: AdaBound; Bound functions; Deep learning; Individual convergence; Non-convex optimization.
Copyright © 2021 Elsevier Ltd. All rights reserved.