Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

Aketi, Sai Aparna; Kodge, Sangamesh; Roy, Kaushik

Computer Science > Machine Learning

arXiv:2209.14390v1 (cs)

[Submitted on 28 Sep 2022 (this version), latest version 20 Mar 2023 (v6)]

Title:Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

Authors:Sai Aparna Aketi, Sangamesh Kodge, Kaushik Roy

View PDF

Abstract:Decentralized learning algorithms enable the training of deep learning models over large distributed datasets generated at different devices and locations, without the need for a central server. In practical scenarios, the distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed (IID). This paper focuses on improving decentralized learning over non-IID data distributions with minimal compute and memory overheads. We propose Neighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the received neighbors' model parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). Further, we present CompNGC, a compressed version of NGC that reduces the communication overhead by $32 \times$ by compressing the cross-gradients. We demonstrate the empirical convergence and efficiency of the proposed technique over non-IID data distributions sampled from the CIFAR-10 dataset on various model architectures and graph topologies. Our experiments demonstrate that NGC and CompNGC outperform the existing state-of-the-art (SoTA) decentralized learning algorithm over non-IID data by $1-5\%$ with significantly less compute and memory requirements. Further, we also show that the proposed NGC method outperforms the baseline by $5-40\%$ with no additional communication.

Comments:	15 pages, 5 figures, 7 tables
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
Cite as:	arXiv:2209.14390 [cs.LG]
	(or arXiv:2209.14390v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.14390

Submission history

From: Sai Aparna Aketi [view email]
[v1] Wed, 28 Sep 2022 19:28:54 UTC (378 KB)
[v2] Fri, 30 Sep 2022 01:50:04 UTC (378 KB)
[v3] Mon, 21 Nov 2022 20:06:54 UTC (391 KB)
[v4] Fri, 27 Jan 2023 19:33:19 UTC (548 KB)
[v5] Sat, 25 Feb 2023 17:41:31 UTC (548 KB)
[v6] Mon, 20 Mar 2023 20:05:33 UTC (548 KB)

Computer Science > Machine Learning

Title:Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators