Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization

Yuqing Liang; Yang Yang; Jinlan Liu; Dongpo Xu

doi:10.1016/j.neunet.2024.106514

Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization

Neural Netw. 2024 Nov:179:106514. doi: 10.1016/j.neunet.2024.106514. Epub 2024 Jul 6.

Authors

Yuqing Liang¹, Yang Yang¹, Jinlan Liu², Dongpo Xu³

Affiliations

¹ Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China.
² Department of Mathematics, Changchun Normal University, Changchun 130032, China. Electronic address: [email protected].
³ Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China. Electronic address: [email protected].

PMID: 39024708
DOI: 10.1016/j.neunet.2024.106514

Abstract

Shuffling-type gradient method is a popular machine learning algorithm that solves finite-sum optimization problems by randomly shuffling samples during iterations. In this paper, we explore the convergence properties of shuffling-type gradient method under mild assumptions. Specifically, we employ the bandwidth-based step size strategy that covers both monotonic and non-monotonic step sizes, thereby providing a unified convergence guarantee in terms of step size. Additionally, we replace the lower bound assumption of the objective function with that of the loss function, thereby eliminating the restrictions on the variance and the second-order moment of stochastic gradient that are difficult to verify in practice. For non-convex objectives, we recover the last iteration convergence of shuffling-type gradient algorithm with a less cumbersome proof. Meanwhile, we also establish the convergence rate for the minimum iteration of gradient norms. Under the Polyak-Łojasiewicz (PL) condition, we prove that the function value of last iteration converges to the lower bound of the objective function. By selecting appropriate boundary functions, we further improve the previous sublinear convergence rate results. Overall, this paper contributes to the understanding of shuffling-type gradient method and its convergence properties, providing insights for optimizing finite-sum problems in machine learning. Finally, numerical experiments demonstrate the efficiency of shuffling-type gradient method with bandwidth-based step size and validate our theoretical results.

Keywords: Bandwidth-based step size; Last iteration convergence; Non-convex objectives; PL condition; Shuffling-type gradient algorithm.

MeSH terms

Algorithms*
Machine Learning*
Neural Networks, Computer