Demystifying SGD with Doubly Stochastic Gradients

Kim, Kyurae; Ko, Joohwan; Ma, Yi-An; Gardner, Jacob R.

Statistics > Machine Learning

arXiv:2406.00920 (stat)

[Submitted on 3 Jun 2024]

Title:Demystifying SGD with Doubly Stochastic Gradients

Authors:Kyurae Kim, Joohwan Ko, Yi-An Ma, Jacob R. Gardner

View PDF

Abstract:Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of the effect correlations. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our analysis suggests where one should invest most of the budget in general. Furthermore, we prove that random reshuffling (RR) improves the complexity dependence on the subsampling noise.

Comments:	Accepted to ICML'24
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2406.00920 [stat.ML]
	(or arXiv:2406.00920v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2406.00920

Submission history

From: Kyurae Kim [view email]
[v1] Mon, 3 Jun 2024 01:13:19 UTC (342 KB)

Statistics > Machine Learning

Title:Demystifying SGD with Doubly Stochastic Gradients

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Demystifying SGD with Doubly Stochastic Gradients

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators