Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Compagnoni, Enea Monzio; Liu, Tianlin; Islamov, Rustem; Proske, Frank Norbert; Orvieto, Antonio; Lucchi, Aurelien

Computer Science > Machine Learning

arXiv:2411.15958 (cs)

[Submitted on 24 Nov 2024]

Title:Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Authors:Enea Monzio Compagnoni, Tianlin Liu, Rustem Islamov, Frank Norbert Proske, Antonio Orvieto, Aurelien Lucchi

View PDF

Abstract:Despite the vast empirical evidence supporting the efficacy of adaptive optimization methods in deep learning, their theoretical understanding is far from complete. This work introduces novel SDEs for commonly used adaptive optimizers: SignSGD, RMSprop(W), and Adam(W). These SDEs offer a quantitatively accurate description of these optimizers and help illuminate an intricate relationship between adaptivity, gradient noise, and curvature. Our novel analysis of SignSGD highlights a noteworthy and precise contrast to SGD in terms of convergence speed, stationary distribution, and robustness to heavy-tail noise. We extend this analysis to AdamW and RMSpropW, for which we observe that the role of noise is much more complex. Crucially, we support our theoretical analysis with experimental evidence by verifying our insights: this includes numerically integrating our SDEs using Euler-Maruyama discretization on various neural network architectures such as MLPs, CNNs, ResNets, and Transformers. Our SDEs accurately track the behavior of the respective optimizers, especially when compared to previous SDEs derived for Adam and RMSprop. We believe our approach can provide valuable insights into best training practices and novel scaling rules.

Comments:	An earlier version, titled 'SDEs for Adaptive Methods: The Role of Noise' and dated May 2024, is available on OpenReview
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2411.15958 [cs.LG]
	(or arXiv:2411.15958v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.15958

Submission history

From: Enea Monzio Compagnoni Mr. [view email]
[v1] Sun, 24 Nov 2024 19:07:31 UTC (29,711 KB)

Computer Science > Machine Learning

Title:Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators