OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Xue, Fuzhao; Zheng, Zian; Fu, Yao; Ni, Jinjie; Zheng, Zangwei; Zhou, Wangchunshu; You, Yang

Computer Science > Computation and Language

arXiv:2402.01739v1 (cs)

[Submitted on 29 Jan 2024 (this version), latest version 27 Mar 2024 (v2)]

Title:OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Authors:Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

View PDF HTML (experimental)

Abstract:To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens. Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs, highlighting the potential effectiveness for future LLM development.
One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End. We discovered that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance. The token-to-expert assignments are determined early in the pre-training phase and remain largely unchanged. This imperfect routing can result in performance degradation, particularly in sequential tasks like multi-turn conversations, where tokens appearing later in a sequence are more likely to be dropped. Finally, we rethink our design based on the above-mentioned observations and analysis. To facilitate future MoE LLM development, we propose potential strategies for mitigating the issues we found and further improving off-the-shelf MoE LLM designs.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2402.01739 [cs.CL]
	(or arXiv:2402.01739v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.01739

Submission history

From: Fuzhao Xue [view email]
[v1] Mon, 29 Jan 2024 12:05:02 UTC (678 KB)
[v2] Wed, 27 Mar 2024 10:21:24 UTC (755 KB)

Computer Science > Computation and Language

Title:OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators