Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Niu, Yue; Prakash, Saurav; Kundu, Souvik; Lee, Sunwoo; Avestimehr, Salman

Computer Science > Machine Learning

arXiv:2208.13141 (cs)

[Submitted on 28 Aug 2022 (v1), last revised 10 Oct 2023 (this version, v3)]

Title:Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Authors:Yue Niu, Saurav Prakash, Souvik Kundu, Sunwoo Lee, Salman Avestimehr

View PDF

Abstract:Federated Learning (FL) is emerging as a popular, promising decentralized learning framework that enables collaborative training among clients, with no need to share private data between them or to a centralized server. However, considering many edge clients do not have sufficient computing, memory, or communication capabilities, federated learning of large models still faces significant bottlenecks. To keep such weak but crucial clients in the loop, prior works either consider a heterogeneous-client setting where clients train models with different sizes; or offload training to the server. However, the heterogeneous-client setting requires some clients to train full model, which is not aligned with the resource-constrained setting; while the latter ones break privacy promises in FL when sharing intermediate representations or labels with the server. To overcome these limitations, in this work, we formulate a realistic, but much less explored, cross-device FL setting in which no client can train a full large model nor is willing to share any intermediate information with the remote server. Under such a formulation, we develop a principal sub-model (PriSM) training methodology to collaboratively train a full large model, while assigning each client a small sub-model that is a probabilistic low-rank approximation to the full server model. When creating sub-models, PriSM first performs a principal kernel analysis in the orthogonal kernel space to obtain importance of each kernel. Then, PriSM adopts a novel importance-aware sampling process to select a subset of kernels (i.e., a kernel with high importance is assigned with a higher sampling probability). This sampling process ensures each sub-model is still a low-rank approximation to the full model, while all sub-models together achieve nearly full coverage on the principal kernels.

Comments:	19 pages, 11 figures. Accepted to Transactions on Machine Learning Research (TMLR) 2023 Code: this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2208.13141 [cs.LG]
	(or arXiv:2208.13141v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2208.13141

Submission history

From: Yue Niu [view email]
[v1] Sun, 28 Aug 2022 05:17:03 UTC (1,964 KB)
[v2] Mon, 17 Oct 2022 04:54:43 UTC (2,141 KB)
[v3] Tue, 10 Oct 2023 23:04:48 UTC (2,468 KB)

Computer Science > Machine Learning

Title:Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Federated Learning of Large Models at the Edge via Principal Sub-Model Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators