S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Do, Giang; Le, Hung; Tran, Truyen

Computer Science > Computation and Language

arXiv:2503.23007 (cs)

[Submitted on 29 Mar 2025]

Title:S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Authors:Giang Do, Hung Le, Truyen Tran

View PDF HTML (experimental)

Abstract:Sparse Mixture of Experts (SMoE) enables efficient training of large language models by routing input tokens to a select number of experts. However, training SMoE remains challenging due to the issue of representation collapse. Recent studies have focused on improving the router to mitigate this problem, but existing approaches face two key limitations: (1) expert embeddings are significantly smaller than the model's dimension, contributing to representation collapse, and (2) routing each input to the Top-K experts can cause them to learn overly similar features. In this work, we propose a novel approach called Robust Sparse Mixture of Experts via Stochastic Learning (S2MoE), which is a mixture of experts designed to learn from both deterministic and non-deterministic inputs via Learning under Uncertainty. Extensive experiments across various tasks demonstrate that S2MoE achieves performance comparable to other routing methods while reducing computational inference costs by 28%.

Comments:	4 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2503.23007 [cs.CL]
	(or arXiv:2503.23007v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.23007

Submission history

From: Truong Giang Do [view email]
[v1] Sat, 29 Mar 2025 08:14:27 UTC (299 KB)

Computer Science > Computation and Language

Title:S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators