L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Alimohammadi, Mohammadreza; Markov, Ilia; Frantar, Elias; Alistarh, Dan

Computer Science > Machine Learning

arXiv:2210.17357 (cs)

[Submitted on 31 Oct 2022 (v1), last revised 9 Jun 2023 (this version, v2)]

Title:L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Authors:Mohammadreza Alimohammadi, Ilia Markov, Elias Frantar, Dan Alistarh

View PDF

Abstract:Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model's layers dynamically during training, improving the overall compression, while leading to substantial speedups, without sacrificing accuracy. Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while satisfying an error constraint. Extensive experiments over image classification and language modeling tasks shows that L-GreCo is effective across all existing families of compression methods, and achieves up to 2.5$\times$ training speedup and up to 5$\times$ compression improvement over efficient implementations of existing approaches, while recovering full accuracy. Moreover, L-GreCo is complementary to existing adaptive algorithms, improving their compression ratio by 50% and practical throughput by 66%.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2210.17357 [cs.LG]
	(or arXiv:2210.17357v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.17357

Submission history

From: Ilia Markov [view email]
[v1] Mon, 31 Oct 2022 14:37:41 UTC (2,871 KB)
[v2] Fri, 9 Jun 2023 17:11:26 UTC (3,405 KB)

Computer Science > Machine Learning

Title:L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators