Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Lu, Zhenyi; Fan, Chenghao; Wei, Wei; Qu, Xiaoye; Chen, Dangyang; Cheng, Yu

Computer Science > Computation and Language

arXiv:2406.15479 (cs)

[Submitted on 17 Jun 2024 (v1), last revised 14 Oct 2024 (this version, v2)]

Title:Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Authors:Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

View PDF HTML (experimental)

Abstract:In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input. This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data. Extensive experiments on $20$ datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks. Our implementation is available in \url{this https URL}

Comments:	NeurIPS 2024 poster
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2406.15479 [cs.CL]
	(or arXiv:2406.15479v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.15479

Submission history

From: Zhenyi Lu [view email]
[v1] Mon, 17 Jun 2024 02:31:55 UTC (8,078 KB)
[v2] Mon, 14 Oct 2024 04:14:26 UTC (7,666 KB)

Computer Science > Computation and Language

Title:Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators