LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Zeng, Binrui; Ji, Bin; Liu, Xiaodong; Yu, Jie; Li, Shasha; Ma, Jun; Li, Xiaopeng; Wang, Shangwen; Hong, Xinran

Computer Science > Computation and Language

arXiv:2412.18135 (cs)

[Submitted on 24 Dec 2024]

Title:LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Authors:Binrui Zeng, Bin Ji, Xiaodong Liu, Jie Yu, Shasha Li, Jun Ma, Xiaopeng Li, Shangwen Wang, Xinran Hong

View PDF HTML (experimental)

Abstract:As large language models (LLMs) demonstrate exceptional performance across various domains, the deployment of these models on edge devices has emerged as a new trend. Quantization techniques, which reduce the size and memory footprint of LLMs, are effective for enabling deployment on resource-constrained edge devices. However, existing one-size-fits-all quantization methods often fail to dynamically adjust the memory consumption of LLMs based on specific hardware characteristics and usage scenarios. To address this limitation, we propose LSAQ (Layer-Specific Adaptive Quantization), a system for adaptive quantization and dynamic deployment of LLMs based on layer importance. LSAQ evaluates layer importance by constructing top-k token sets from the inputs and outputs of each layer and calculating their Jaccard coefficient. Using this evaluation, the system adaptively adjusts quantization strategies in real time according to the resource availability of edge devices, assigning different precision levels to layers of varying importance. This approach significantly reduces the storage requirements of LLMs while maintaining model performance, enabling efficient deployment across diverse hardware platforms and usage scenarios.

Comments:	8 pages, 4 figures, work in progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.18135 [cs.CL]
	(or arXiv:2412.18135v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.18135

Submission history

From: Binrui Zeng [view email]
[v1] Tue, 24 Dec 2024 03:43:15 UTC (217 KB)

Computer Science > Computation and Language

Title:LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators