MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Zhu, Xingkui; Guan, Yiran; Liang, Dingkang; Chen, Yuchao; Liu, Yuliang; Bai, Xiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.04801 (cs)

[Submitted on 7 Jun 2024]

Title:MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Authors:Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai

View PDF HTML (experimental)

Abstract:The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency. However, training MoE models from scratch demands extensive data and computational resources. Moreover, public repositories like timm mainly provide pre-trained dense checkpoints, lacking similar resources for MoE models, hindering their adoption. To bridge this gap, we introduce MoE Jetpack, an effective method for fine-tuning dense checkpoints into MoE models. MoE Jetpack incorporates two key techniques: (1) checkpoint recycling, which repurposes dense checkpoints as initial weights for MoE models, thereby accelerating convergence, enhancing accuracy, and alleviating the computational burden of pre-training; (2) hyperspherical adaptive MoE (SpheroMoE) layer, which optimizes the MoE architecture for better integration of dense checkpoints, enhancing fine-tuning performance. Our experiments on vision tasks demonstrate that MoE Jetpack significantly improves convergence speed and accuracy when fine-tuning dense checkpoints into MoE models. Our code will be publicly available at this https URL.

Comments:	9 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
ACM classes:	I.2
Cite as:	arXiv:2406.04801 [cs.CV]
	(or arXiv:2406.04801v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.04801

Submission history

From: Xingkui Zhu [view email]
[v1] Fri, 7 Jun 2024 10:05:42 UTC (8,128 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators