Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Long, Jinqiang; Dai, Yanqi; Yang, Guoxing; Lin, Hongpeng; Fei, Nanyi; Gao, Yizhao; Lu, Zhiwu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.10669 (cs)

[Submitted on 16 Nov 2024]

Title:Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Authors:Jinqiang Long, Yanqi Dai, Guoxing Yang, Hongpeng Lin, Nanyi Fei, Yizhao Gao, Zhiwu Lu

View PDF HTML (experimental)

Abstract:As the research of Multimodal Large Language Models (MLLMs) becomes popular, an advancing MLLM model is typically required to handle various textual and visual tasks (e.g., VQA, Detection, OCR, and ChartQA) simultaneously for real-world applications. However, due to the significant differences in representation and distribution among data from various tasks, simply mixing data of all tasks together leads to the well-known``multi-task conflict" issue, resulting in performance degradation across various tasks. To address this issue, we propose Awaker2.5-VL, a Mixture of Experts~(MoE) architecture suitable for MLLM, which acquires the multi-task capabilities through multiple sparsely activated experts. To speed up the training and inference of Awaker2.5-VL, each expert in our model is devised as a low-rank adaptation (LoRA) structure. Extensive experiments on multiple latest benchmarks demonstrate the effectiveness of Awaker2.5-VL. The code and model weight are released in our Project Page: this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.10669 [cs.CV]
	(or arXiv:2411.10669v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.10669

Submission history

From: Zhiwu Lu [view email]
[v1] Sat, 16 Nov 2024 02:10:14 UTC (439 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators