Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Xu, Yangyang; Xu, Yibo

Mathematics > Optimization and Control

arXiv:2006.00425 (math)

[Submitted on 31 May 2020 (v1), last revised 29 Apr 2022 (this version, v3)]

Title:Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Authors:Yangyang Xu, Yibo Xu

View PDF

Abstract:Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the ${O}(\varepsilon^{-3})$ result by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

Comments:	Yibo Xu joined this work by adding a new proof for the case with constant stepsize and removing the requirement of a large initial minibatch
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Numerical Analysis (math.NA)
MSC classes:	90C15, 65K05, 68Q25
Cite as:	arXiv:2006.00425 [math.OC]
	(or arXiv:2006.00425v3 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2006.00425

Submission history

From: Yangyang Xu [view email]
[v1] Sun, 31 May 2020 03:18:45 UTC (145 KB)
[v2] Mon, 26 Apr 2021 18:23:33 UTC (211 KB)
[v3] Fri, 29 Apr 2022 11:45:38 UTC (254 KB)

Mathematics > Optimization and Control

Title:Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators