1bit-Merging: Dynamic Quantized Merging for Large Language Models

Liu, Shuqi; Wu, Han; He, Bowei; Liu, Zehua; Han, Xiongwei; Yuan, Mingxuan; Song, Linqi

Abstract:Recent advances in large language models have led to specialized models excelling in specific domains, creating a need for efficient model merging techniques. While traditional merging approaches combine parameters into a single static model, they often compromise task-specific performance. However, task-specific routing methods maintain accuracy but introduce substantial storage overhead. We present \texttt{1bit}-Merging, a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency. Our approach leverages the observation that different task-specific models store knowledge in distinct layers-chat models primarily in attention layers and math/code models in MLP layers-enabling targeted compression strategies. Through extensive experiments with LLaMA2 and Mistral model families across chat, mathematical reasoning, and code generation tasks, we demonstrate that \texttt{1bit}-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements. Our framework offers a practical solution for combining specialized models while maintaining their individual strengths and addressing the storage challenges of current approaches.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.10743 [cs.CL]
	(or arXiv:2502.10743v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.10743

Computer Science > Computation and Language

Title:1bit-Merging: Dynamic Quantized Merging for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators