GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Dimlioglu, Tolga; Choromanska, Anna

Computer Science > Machine Learning

arXiv:2403.04206 (cs)

[Submitted on 7 Mar 2024]

Title:GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Authors:Tolga Dimlioglu, Anna Choromanska

View PDF HTML (experimental)

Abstract:We study distributed training of deep learning models in time-constrained environments. We propose a new algorithm that periodically pulls workers towards the center variable computed as a weighted average of workers, where the weights are inversely proportional to the gradient norms of the workers such that recovering the flat regions in the optimization landscape is prioritized. We develop two asynchronous variants of the proposed algorithm that we call Model-level and Layer-level Gradient-based Weighted Averaging (resp. MGRAWA and LGRAWA), which differ in terms of the weighting scheme that is either done with respect to the entire model or is applied layer-wise. On the theoretical front, we prove the convergence guarantee for the proposed approach in both convex and non-convex settings. We then experimentally demonstrate that our algorithms outperform the competitor methods by achieving faster convergence and recovering better quality and flatter local optima. We also carry out an ablation study to analyze the scalability of the proposed algorithms in more crowded distributed training environments. Finally, we report that our approach requires less frequent communication and fewer distributed updates compared to the state-of-the-art baselines.

Comments:	9 pages main of main text, in total 24
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Cite as:	arXiv:2403.04206 [cs.LG]
	(or arXiv:2403.04206v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.04206

Submission history

From: Tolga Dimlioglu [view email]
[v1] Thu, 7 Mar 2024 04:22:34 UTC (2,925 KB)

Computer Science > Machine Learning

Title:GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators