Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization

Neural Netw. 2024 Nov:179:106514. doi: 10.1016/j.neunet.2024.106514. Epub 2024 Jul 6.

Abstract

Shuffling-type gradient method is a popular machine learning algorithm that solves finite-sum optimization problems by randomly shuffling samples during iterations. In this paper, we explore the convergence properties of shuffling-type gradient method under mild assumptions. Specifically, we employ the bandwidth-based step size strategy that covers both monotonic and non-monotonic step sizes, thereby providing a unified convergence guarantee in terms of step size. Additionally, we replace the lower bound assumption of the objective function with that of the loss function, thereby eliminating the restrictions on the variance and the second-order moment of stochastic gradient that are difficult to verify in practice. For non-convex objectives, we recover the last iteration convergence of shuffling-type gradient algorithm with a less cumbersome proof. Meanwhile, we also establish the convergence rate for the minimum iteration of gradient norms. Under the Polyak-Łojasiewicz (PL) condition, we prove that the function value of last iteration converges to the lower bound of the objective function. By selecting appropriate boundary functions, we further improve the previous sublinear convergence rate results. Overall, this paper contributes to the understanding of shuffling-type gradient method and its convergence properties, providing insights for optimizing finite-sum problems in machine learning. Finally, numerical experiments demonstrate the efficiency of shuffling-type gradient method with bandwidth-based step size and validate our theoretical results.

Keywords: Bandwidth-based step size; Last iteration convergence; Non-convex objectives; PL condition; Shuffling-type gradient algorithm.

MeSH terms

  • Algorithms*
  • Machine Learning*
  • Neural Networks, Computer