Stochastic momentum methods are widely used to solve stochastic optimization problems in machine learning. However, most of the existing theoretical analyses rely on either bounded assumptions or strong stepsize conditions. In this paper, we focus on a class of non-convex objective functions satisfying the Polyak-Łojasiewicz (PL) condition and present a unified convergence rate analysis for stochastic momentum methods without any bounded assumptions, which covers stochastic heavy ball (SHB) and stochastic Nesterov accelerated gradient (SNAG). Our analysis achieves the more challenging last-iterate convergence rate of function values under the relaxed growth (RG) condition, which is a weaker assumption than those used in related work. Specifically, we attain the sub-linear rate for stochastic momentum methods with diminishing stepsizes, and the linear convergence rate for constant stepsizes if the strong growth (SG) condition holds. We also examine the iteration complexity for obtaining an ϵ-accurate solution of the last-iterate. Moreover, we provide a more flexible stepsize scheme for stochastic momentum methods in three points: (i) relaxing the last-iterate convergence stepsize from square summable to zero limitation; (ii) extending the minimum-iterate convergence rate stepsize to the non-monotonic case; (iii) expanding the last-iterate convergence rate stepsize to a more general form. Finally, we conduct numerical experiments on benchmark datasets to validate our theoretical findings.
Keywords: Last-iterate convergence rate; Machine learning; Non-convex optimization; PL condition; Stochastic momentum methods.
Copyright © 2023 Elsevier Ltd. All rights reserved.