High-probability Convergence Bounds for Non-convex Stochastic Gradient Descent

Madden, Liam; Dall'Anese, Emiliano; Becker, Stephen

Mathematics > Optimization and Control

arXiv:2006.05610v4 (math)

[Submitted on 10 Jun 2020 (v1), revised 16 Nov 2021 (this version, v4), latest version 15 Jul 2024 (v5)]

Title:High-probability Convergence Bounds for Non-convex Stochastic Gradient Descent

Authors:Liam Madden, Emiliano Dall'Anese, Stephen Becker

View PDF

Abstract:Stochastic gradient descent is one of the most common iterative algorithms used in machine learning. While being computationally cheap to implement, recent literature suggests it may have implicit regularization properties that prevent over-fitting. This paper analyzes the properties of stochastic gradient descent from a theoretical standpoint to help bridge the gap between theoretical and empirical results. Most theoretical results either assume convexity or only provide convergence results in mean, while this paper proves convergence bounds in high probability without assuming convexity. Assuming strong smoothness, we prove high probability convergence bounds in two settings: (1) assuming the Polyak-Łojasiewicz inequality and norm sub-Gaussian gradient noise and (2) assuming norm sub-Weibull gradient noise. In the first setting, we combine our convergence bounds with existing generalization bounds in order to bound the true risk and show that for a certain number of epochs, convergence and generalization balance in such a way that the true risk goes to the empirical minimum as the number of samples goes to infinity. In the second setting, as an intermediate step to proving convergence, we prove a probability result of independent interest. The probability result extends Freedman-type concentration beyond the sub-exponential threshold to heavier-tailed martingale difference sequences.

Comments:	V4: significant additions. Proved a convergence bound in the non-convex setting with norm sub-Weibull noise and proved a Freedman-type inequality for sub-Weibull martingale difference sequences. Also included numerics for both the PL and non-convex settings
Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:2006.05610 [math.OC]
	(or arXiv:2006.05610v4 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2006.05610

Submission history

From: Liam Madden [view email]
[v1] Wed, 10 Jun 2020 02:06:56 UTC (34 KB)
[v2] Fri, 30 Oct 2020 17:43:24 UTC (30 KB)
[v3] Wed, 6 Jan 2021 21:54:54 UTC (30 KB)
[v4] Tue, 16 Nov 2021 01:05:55 UTC (1,229 KB)
[v5] Mon, 15 Jul 2024 03:23:51 UTC (892 KB)

Mathematics > Optimization and Control

Title:High-probability Convergence Bounds for Non-convex Stochastic Gradient Descent

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:High-probability Convergence Bounds for Non-convex Stochastic Gradient Descent

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators