High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Madden, Liam; Dall'Anese, Emiliano; Becker, Stephen

Mathematics > Optimization and Control

arXiv:2006.05610 (math)

[Submitted on 10 Jun 2020 (v1), last revised 15 Jul 2024 (this version, v5)]

Title:High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Authors:Liam Madden, Emiliano Dall'Anese, Stephen Becker

View PDF HTML (experimental)

Abstract:Stochastic gradient descent is one of the most common iterative algorithms used in machine learning and its convergence analysis is a rich area of research. Understanding its convergence properties can help inform what modifications of it to use in different settings. However, most theoretical results either assume convexity or only provide convergence results in mean. This paper, on the other hand, proves convergence bounds in high probability without assuming convexity. Assuming strong smoothness, we prove high probability convergence bounds in two settings: (1) assuming the Polyak-Łojasiewicz inequality and norm sub-Gaussian gradient noise and (2) assuming norm sub-Weibull gradient noise. In the second setting, as an intermediate step to proving convergence, we prove a sub-Weibull martingale difference sequence self-normalized concentration inequality of independent interest. It extends Freedman-type concentration beyond the sub-exponential threshold to heavier-tailed martingale difference sequences. We also provide a post-processing method that picks a single iterate with a provable convergence guarantee as opposed to the usual bound for the unknown best iterate. Our convergence result for sub-Weibull noise extends the regime where stochastic gradient descent has equal or better convergence guarantees than stochastic gradient descent with modifications such as clipping, momentum, and normalization.

Comments:	V5: reorganization. No new analysis, but the generalization analysis was removed, the post-processing algorithm was emphasized, and new numerical experiments were run
Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:2006.05610 [math.OC]
	(or arXiv:2006.05610v5 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2006.05610
Journal reference:	Journal of Machine Learning Research, 25(241):1-36, 2024

Submission history

From: Liam Madden [view email]
[v1] Wed, 10 Jun 2020 02:06:56 UTC (34 KB)
[v2] Fri, 30 Oct 2020 17:43:24 UTC (30 KB)
[v3] Wed, 6 Jan 2021 21:54:54 UTC (30 KB)
[v4] Tue, 16 Nov 2021 01:05:55 UTC (1,229 KB)
[v5] Mon, 15 Jul 2024 03:23:51 UTC (892 KB)

Mathematics > Optimization and Control

Title:High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators