Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Xu, Zifei; Sharify, Sayeh; Yazar, Wanzin; Webb, Tristan; Wang, Xin

Computer Science > Machine Learning

arXiv:2410.14570 (cs)

[Submitted on 18 Oct 2024 (v1), last revised 17 Apr 2025 (this version, v2)]

Title:Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Authors:Zifei Xu, Sayeh Sharify, Wanzin Yazar, Tristan Webb, Xin Wang

View PDF HTML (experimental)

Abstract:Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision. This can be achieved either through post-training quantization by minimizing local, layer-wise quantization errors, or through quantization-aware fine-tuning by minimizing the global loss function. In this study, we discovered that, under the same data constraint, the former approach nearly always fared worse than the latter, a phenomenon particularly prominent when the numerical precision is very low. We further showed that this difficulty of post-training quantization arose from stark misalignment between optimization of the local and global objective functions. Our findings explains limited utility in minimization of local quantization error and the importance of direct quantization-aware fine-tuning, in the regime of large models at very low precision.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2410.14570 [cs.LG]
	(or arXiv:2410.14570v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.14570

Submission history

From: Zifei Xu [view email]
[v1] Fri, 18 Oct 2024 16:16:52 UTC (3,629 KB)
[v2] Thu, 17 Apr 2025 23:26:11 UTC (4,886 KB)

Computer Science > Machine Learning

Title:Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators