On the effectiveness of discrete representations in sparse mixture of experts

Do, Giang; Pham, Kha; Le, Hung; Tran, Truyen

Computer Science > Machine Learning

arXiv:2411.19402 (cs)

[Submitted on 28 Nov 2024]

Title:On the effectiveness of discrete representations in sparse mixture of experts

Authors:Giang Do, Kha Pham, Hung Le, Truyen Tran

View PDF HTML (experimental)

Abstract:Sparse mixture of experts (SMoE) is an effective solution for scaling up model capacity without increasing the computational costs. A crucial component of SMoE is the router, responsible for directing the input to relevant experts; however, it also presents a major weakness, leading to routing inconsistencies and representation collapse issues. Instead of fixing the router like previous works, we propose an alternative that assigns experts to input via indirection, which employs the discrete representation of input that points to the expert. The discrete representations are learnt via vector quantization, resulting in a new architecture dubbed Vector-Quantized Mixture of Experts (VQMoE). We provide theoretical support and empirical evidence demonstrating the VQMoE's ability to overcome the challenges present in traditional routers. Through extensive evaluations on both large language models and vision tasks for pre-training and fine-tuning, we show that VQMoE achieves a 28% improvement in robustness compared to other SMoE routing methods, while maintaining strong performance in fine-tuning tasks.

Comments:	17 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2411.19402 [cs.LG]
	(or arXiv:2411.19402v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.19402

Submission history

From: Truong Giang Do [view email]
[v1] Thu, 28 Nov 2024 22:32:01 UTC (1,067 KB)

Computer Science > Machine Learning

Title:On the effectiveness of discrete representations in sparse mixture of experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the effectiveness of discrete representations in sparse mixture of experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators