P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Huang, Hongjing; Li, Yingtao; Sun, Jie; Zhu, Xueying; Zhang, Jie; Luo, Liang; Li, Jialin; Wang, Zeke

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2305.05885 (cs)

[Submitted on 10 May 2023]

Title:P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Authors:Hongjing Huang, Yingtao Li, Jie Sun, Xueying Zhu, Jie Zhang, Liang Luo, Jialin Li, Zeke Wang

View PDF

Abstract:Generalized linear models (GLMs) are a widely utilized family of machine learning models in real-world applications. As data size increases, it is essential to perform efficient distributed training for these models. However, existing systems for distributed training have a high cost for communication and often use large batch sizes to balance computation and communication, which negatively affects convergence. Therefore, we argue for an efficient distributed GLM training system that strives to achieve linear scalability, while keeping batch size reasonably low. As a start, we propose P4SGD, a distributed heterogeneous training system that efficiently trains GLMs through model parallelism between distributed FPGAs and through forward-communication-backward pipeline parallelism within an FPGA. Moreover, we propose a light-weight, latency-centric in-switch aggregation protocol to minimize the latency of the AllReduce operation between distributed FPGAs, powered by a programmable switch. As such, to our knowledge, P4SGD is the first solution that achieves almost linear scalability between distributed accelerators through model parallelism. We implement P4SGD on eight Xilinx U280 FPGAs and a Tofino P4 switch. Our experiments show P4SGD converges up to 6.5X faster than the state-of-the-art GPU counterpar.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2305.05885 [cs.DC]
	(or arXiv:2305.05885v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2305.05885

Submission history

From: Hongjing Huang [view email]
[v1] Wed, 10 May 2023 04:16:16 UTC (9,273 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators