Don't Use Large Mini-Batches, Use Local SGD

Lin, Tao; Stich, Sebastian U.; Patel, Kumar Kshitij; Jaggi, Martin

Computer Science > Machine Learning

arXiv:1808.07217 (cs)

[Submitted on 22 Aug 2018 (v1), last revised 17 Feb 2020 (this version, v6)]

Title:Don't Use Large Mini-Batches, Use Local SGD

Authors:Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

View PDF

Abstract:Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency and scalability gains in recent years. However, progress faces a major roadblock, as models trained with large batches often do not generalize well, i.e. they do not show good accuracy on new data. As a remedy, we propose a \emph{post-local} SGD and show that it significantly improves the generalization performance compared to large-batch training on standard benchmarks while enjoying the same efficiency (time-to-accuracy) and scalability. We further provide an extensive study of the communication efficiency vs. performance trade-offs associated with a host of \emph{local SGD} variants.

Comments:	To appear in ICLR 2020
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1808.07217 [cs.LG]
	(or arXiv:1808.07217v6 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1808.07217

Submission history

From: Tao Lin [view email]
[v1] Wed, 22 Aug 2018 04:50:55 UTC (3,965 KB)
[v2] Sat, 6 Oct 2018 13:47:52 UTC (3,974 KB)
[v3] Sun, 21 Oct 2018 14:23:33 UTC (4,230 KB)
[v4] Tue, 5 Feb 2019 07:30:54 UTC (4,271 KB)
[v5] Wed, 5 Jun 2019 11:39:13 UTC (3,647 KB)
[v6] Mon, 17 Feb 2020 11:42:10 UTC (3,699 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-08

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tao Lin
Sebastian U. Stich
Martin Jaggi

export BibTeX citation

Computer Science > Machine Learning

Title:Don't Use Large Mini-Batches, Use Local SGD

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Don't Use Large Mini-Batches, Use Local SGD

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators