Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression

Li, Xuheng; Gu, Quanquan

Computer Science > Machine Learning

arXiv:2502.14123 (cs)

[Submitted on 19 Feb 2025]

Title:Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression

Authors:Xuheng Li, Quanquan Gu

View PDF HTML (experimental)

Abstract:Exponential moving average (EMA) has recently gained significant popularity in training modern deep learning models, especially diffusion-based generative models. However, there have been few theoretical results explaining the effectiveness of EMA. In this paper, to better understand EMA, we establish the risk bound of online SGD with EMA for high-dimensional linear regression, one of the simplest overparameterized learning tasks that shares similarities with neural networks. Our results indicate that (i) the variance error of SGD with EMA is always smaller than that of SGD without averaging, and (ii) unlike SGD with iterate averaging from the beginning, the bias error of SGD with EMA decays exponentially in every eigen-subspace of the data covariance matrix. Additionally, we develop proof techniques applicable to the analysis of a broad class of averaging schemes.

Comments:	34 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2502.14123 [cs.LG]
	(or arXiv:2502.14123v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.14123

Submission history

From: Xuheng Li [view email]
[v1] Wed, 19 Feb 2025 21:55:11 UTC (171 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2025-02

Change to browse by:

cs
math
math.OC
stat
stat.ML

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators