On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Dai, Xiaowu; Zhu, Yuhua

Computer Science > Machine Learning

arXiv:2112.00987 (cs)

[Submitted on 2 Dec 2021]

Title:On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Authors:Xiaowu Dai, Yuhua Zhu

View PDF

Abstract:We study the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD). We approximate the mini-batch SGD and the momentum SGD as stochastic differential equations (SDEs). We exploit the continuous formulation of SDE and the theory of Fokker-Planck equations to develop new results on the escaping phenomenon and the relationship with large batch and sharp minima. In particular, we find that the stochastic process solution tends to converge to flatter minima regardless of the batch size in the asymptotic regime. However, the convergence rate is rigorously proven to depend on the batch size. These results are validated empirically with various datasets and models.

Subjects:	Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:2112.00987 [cs.LG]
	(or arXiv:2112.00987v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2112.00987

Submission history

From: Xiaowu Dai [view email]
[v1] Thu, 2 Dec 2021 05:24:05 UTC (4,571 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-12

Change to browse by:

cs
math
math.ST
stat
stat.TH

References & Citations

DBLP - CS Bibliography

listing | bibtex

Xiaowu Dai
Yuhua Zhu

export BibTeX citation

Computer Science > Machine Learning

Title:On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators