On the Convergence of Gradient Descent for Large Learning Rates

Crăciun, Alexandru; Ghoshdastidar, Debarghya

Computer Science > Machine Learning

arXiv:2402.13108 (cs)

[Submitted on 20 Feb 2024 (v1), last revised 9 Dec 2024 (this version, v3)]

Title:On the Convergence of Gradient Descent for Large Learning Rates

Authors:Alexandru Crăciun, Debarghya Ghoshdastidar

View PDF HTML (experimental)

Abstract:A vast literature on convergence guarantees for gradient descent and derived methods exists at the moment. However, a simple practical situation remains unexplored: when a fixed step size is used, can we expect gradient descent to converge starting from any initialization? We provide fundamental impossibility results showing that convergence becomes impossible no matter the initialization if the step size gets too big. Looking at the asymptotic value of the gradient norm along the optimization trajectory, we see that there is a sharp transition as the step size crosses a critical value. This has been observed by practitioners, yet the true mechanisms through which this happens remain unclear beyond heuristics. Using results from dynamical systems theory, we provide a proof of this in the case of linear neural networks with a squared loss. We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient. We validate our findings through experiments with non-linear networks.

Subjects:	Machine Learning (cs.LG)
MSC classes:	90C26
Cite as:	arXiv:2402.13108 [cs.LG]
	(or arXiv:2402.13108v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.13108

Submission history

From: Alexandru Craciun [view email]
[v1] Tue, 20 Feb 2024 16:01:42 UTC (55 KB)
[v2] Tue, 3 Sep 2024 14:09:08 UTC (1,072 KB)
[v3] Mon, 9 Dec 2024 14:41:53 UTC (86 KB)

Computer Science > Machine Learning

Title:On the Convergence of Gradient Descent for Large Learning Rates

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Convergence of Gradient Descent for Large Learning Rates

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators