ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Gao, Shangqian; Hua, Ting; Shirkavand, Reza; Lin, Chi-Heng; Tang, Zhen; Li, Zhengao; Yuan, Longge; Li, Fangyi; Zhang, Zeyu; Ganjdanesh, Alireza; Qian, Lou; Jie, Xu; Hsu, Yen-Chang

Computer Science > Machine Learning

arXiv:2501.15316 (cs)

[Submitted on 25 Jan 2025]

Title:ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Authors:Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zhen Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Lou Qian, Xu Jie, Yen-Chang Hsu

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated remarkable abilities in tackling a wide range of complex tasks. However, their huge computational and memory costs raise significant challenges in deploying these models on resource-constrained devices or efficiently serving them. Prior approaches have attempted to alleviate these problems by permanently removing less important model structures, yet these methods often result in substantial performance degradation due to the permanent deletion of model parameters. In this work, we tried to mitigate this issue by reducing the number of active parameters without permanently removing them. Specifically, we introduce a differentiable dynamic pruning method that pushes dense models to maintain a fixed number of active parameters by converting their MLP layers into a Mixture of Experts (MoE) architecture. Our method, even without fine-tuning, consistently outperforms previous structural pruning techniques across diverse model families, including Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2501.15316 [cs.LG]
	(or arXiv:2501.15316v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.15316

Submission history

From: Shangqian Gao [view email]
[v1] Sat, 25 Jan 2025 20:01:42 UTC (1,874 KB)

Computer Science > Machine Learning

Title:ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators