Eager Updates For Overlapped Communication and Computation in DiLoCo

Kale, Satyen; Douillard, Arthur; Donchev, Yanislav

Computer Science > Computation and Language

arXiv:2502.12996 (cs)

[Submitted on 18 Feb 2025]

Title:Eager Updates For Overlapped Communication and Computation in DiLoCo

Authors:Satyen Kale, Arthur Douillard, Yanislav Donchev

View PDF HTML (experimental)

Abstract:Distributed optimization methods such as DiLoCo have been shown to be effective in training very large models across multiple distributed workers, such as datacenters. These methods split updates into two parts: an inner optimization phase, where the workers independently execute multiple optimization steps on their own local data, and an outer optimization step, where the inner updates are synchronized. While such approaches require orders of magnitude less communication than standard data-parallel training, in settings where the workers are datacenters, even the limited communication requirements of these approaches can still cause significant slow downs due to the blocking necessary at each outer optimization step. In this paper, we investigate techniques to mitigate this issue by overlapping communication with computation in a manner that allows the outer optimization step to fully overlap with the inner optimization phase. We show that a particular variant, dubbed eager updates, provides competitive performance with standard DiLoCo in settings with low bandwidth between workers.

Comments:	arXiv admin note: text overlap with arXiv:2501.18512
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.12996 [cs.CL]
	(or arXiv:2502.12996v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.12996

Submission history

From: Arthur Douillard [view email]
[v1] Tue, 18 Feb 2025 16:16:14 UTC (1,257 KB)

Computer Science > Computation and Language

Title:Eager Updates For Overlapped Communication and Computation in DiLoCo

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Eager Updates For Overlapped Communication and Computation in DiLoCo

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators