Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

An, Jing; Lu, Jianfeng

Computer Science > Machine Learning

arXiv:2304.09221v1 (cs)

[Submitted on 18 Apr 2023 (this version), latest version 12 Jan 2024 (v2)]

Title:Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

Authors:Jing An, Jianfeng Lu

View PDF

Abstract:We extend the global convergence result of Chatterjee \cite{chatterjee2022convergence} by considering the stochastic gradient descent (SGD) for non-convex objective functions. With minimal additional assumptions that can be realized by finitely wide neural networks, we prove that if we initialize inside a local region where the Łajasiewicz condition holds, with a positive probability, the stochastic gradient iterates converge to a global minimum inside this region. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. For that, we assume the SGD noise scales with the objective function, which is called machine learning noise and achievable in many real examples. Furthermore, we provide a negative argument to show why using the boundedness of noise with Robbins-Monro type step sizes is not enough to keep the key component valid.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2304.09221 [cs.LG]
	(or arXiv:2304.09221v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2304.09221

Submission history

From: Jing An [view email]
[v1] Tue, 18 Apr 2023 18:20:52 UTC (16 KB)
[v2] Fri, 12 Jan 2024 23:41:44 UTC (19 KB)

Computer Science > Machine Learning

Title:Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators