Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions

G., Daniel M. Jimenez; Solans, David; Heikkila, Mikko; Vitaletti, Andrea; Kourtellis, Nicolas; Anagnostopoulos, Aris; Chatzigiannakis, Ioannis

Computer Science > Machine Learning

arXiv:2411.12377 (cs)

[Submitted on 19 Nov 2024 (v1), last revised 12 Dec 2024 (this version, v2)]

Title:Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions

Authors:Daniel M. Jimenez G., David Solans, Mikko Heikkila, Andrea Vitaletti, Nicolas Kourtellis, Aris Anagnostopoulos, Ioannis Chatzigiannakis

View PDF HTML (experimental)

Abstract:Recent advances in machine learning have highlighted Federated Learning (FL) as a promising approach that enables multiple distributed users (so-called clients) to collectively train ML models without sharing their private data. While this privacy-preserving method shows potential, it struggles when data across clients is not independent and identically distributed (non-IID) data. The latter remains an unsolved challenge that can result in poorer model performance and slower training times. Despite the significance of non-IID data in FL, there is a lack of consensus among researchers about its classification and quantification. This technical survey aims to fill that gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics to quantify data heterogeneity. Additionally, we describe popular solutions to address non-IID data and standardized frameworks employed in FL with heterogeneous data. Based on our state-of-the-art survey, we present key lessons learned and suggest promising future research directions.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2411.12377 [cs.LG]
	(or arXiv:2411.12377v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.12377

Submission history

From: Daniel Mauricio Jimenez Gutierrez [view email]
[v1] Tue, 19 Nov 2024 09:53:28 UTC (7,284 KB)
[v2] Thu, 12 Dec 2024 18:16:23 UTC (7,384 KB)

Computer Science > Machine Learning

Title:Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators