MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Fu, Yao; Jiang, Yinsicheng; Huang, Yeqi; Nie, Ping; Lu, Zhan; Xue, Leyang; He, Congjie; Sit, Man-Kit; Xue, Jilong; Dong, Li; Miao, Ziming; Zou, Kai; Ponti, Edoardo; Mai, Luo

Computer Science > Machine Learning

arXiv:2412.07067 (cs)

[Submitted on 10 Dec 2024 (v1), last revised 2 Mar 2025 (this version, v3)]

Title:MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Authors:Yao Fu, Yinsicheng Jiang, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Kai Zou, Edoardo Ponti, Luo Mai

View PDF HTML (experimental)

Abstract:The Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs). Its key feature, sparse activation, selectively activates only a subset of parameters (experts) per token, reducing memory bandwidth and compute FLOPs compared to dense models. To capitalize on this, MoE designers leverage heterogeneous compute and memory hardware to lower system costs. However, the interaction between model sparsity and hardware heterogeneity introduces trade-offs in Cost, Accuracy, and Performance (CAP). To address this, we introduce MoE-CAP, a benchmarking method for evaluating sparse MoE systems across these three dimensions. Its key innovation is a sparsity-aware CAP analysis model, the first to integrate cost, performance, and accuracy metrics into a single diagram while estimating the impact of sparsity on system performance. MoE-CAP helps practitioners optimize hardware provisioning for an MoE model-or vice versa. MoE-CAP supports various MoE models and provides more accurate metrics than existing methods.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2412.07067 [cs.LG]
	(or arXiv:2412.07067v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.07067

Submission history

From: Yao Fu [view email]
[v1] Tue, 10 Dec 2024 00:19:28 UTC (315 KB)
[v2] Wed, 26 Feb 2025 00:28:08 UTC (239 KB)
[v3] Sun, 2 Mar 2025 16:40:03 UTC (5,070 KB)

Computer Science > Machine Learning

Title:MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators