NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Leplat, Valentin; Merkulov, Daniil; Katrutsa, Aleksandr; Bershatsky, Daniel; Tsymboi, Olga; Oseledets, Ivan

Mathematics > Optimization and Control

arXiv:2209.14937 (math)

[Submitted on 29 Sep 2022 (v1), last revised 30 Sep 2023 (this version, v2)]

Title:NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Authors:Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

View PDF

Abstract:Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively in the case of the minimization of a quadratic function. This analysis allows us to come up with an optimal learning rate in terms of the convergence rate while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. Further, we show that NAG- GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, Transformers in the frame of the GLUE benchmark and the recent Vision Transformers.

Comments:	We study Nesterov acceleration for the Stochastic Differential Equation
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Cite as:	arXiv:2209.14937 [math.OC]
	(or arXiv:2209.14937v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2209.14937

Submission history

From: Daniil Merkulov [view email]
[v1] Thu, 29 Sep 2022 16:54:53 UTC (2,976 KB)
[v2] Sat, 30 Sep 2023 21:07:16 UTC (3,715 KB)

Mathematics > Optimization and Control

Title:NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators