MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Li, Shiyao; Hu, Yingchun; Ning, Xuefei; Liu, Xihui; Hong, Ke; Jia, Xiaotao; Li, Xiuhong; Yan, Yaqi; Ran, Pei; Dai, Guohao; Yan, Shengen; Yang, Huazhong; Wang, Yu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.19509 (cs)

[Submitted on 27 Dec 2024]

Title:MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Authors:Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

View PDF HTML (experimental)

Abstract:Vision-Language Models (VLMs) have enabled a variety of real-world applications. The large parameter size of VLMs brings large memory and computation overhead which poses significant challenges for deployment. Post-Training Quantization (PTQ) is an effective technique to reduce the memory and computation overhead. Existing PTQ methods mainly focus on large language models (LLMs), without considering the differences across other modalities. In this paper, we discover that there is a significant difference in sensitivity between language and vision tokens in large VLMs. Therefore, treating tokens from different modalities equally, as in existing PTQ methods, may over-emphasize the insensitive modalities, leading to significant accuracy loss. To deal with the above issue, we propose a simple yet effective method, Modality-Balanced Quantization (MBQ), for large VLMs. Specifically, MBQ incorporates the different sensitivities across modalities during the calibration process to minimize the reconstruction loss for better quantization parameters. Extensive experiments show that MBQ can significantly improve task accuracy by up to 4.4% and 11.6% under W3 and W4A8 quantization for 7B to 70B VLMs, compared to SOTA baselines. Additionally, we implement a W3 GPU kernel that fuses the dequantization and GEMV operators, achieving a 1.4x speedup on LLaVA-onevision-7B on the RTX 4090. The code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.19509 [cs.CV]
	(or arXiv:2412.19509v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.19509

Submission history

From: Shiyao Li [view email]
[v1] Fri, 27 Dec 2024 07:55:36 UTC (1,982 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators