RPTQ: Reorder-based Post-training Quantization for Large Language Models

Yuan, Zhihang; Niu, Lin; Liu, Jiawei; Liu, Wenyu; Wang, Xinggang; Shang, Yuzhang; Sun, Guangyu; Wu, Qiang; Wu, Jiaxiang; Wu, Bingzhe

Computer Science > Computation and Language

arXiv:2304.01089v2 (cs)

[Submitted on 3 Apr 2023 (v1), revised 6 Apr 2023 (this version, v2), latest version 17 May 2023 (v4)]

Title:RPTQ: Reorder-based Post-training Quantization for Large Language Models

Authors:Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

View PDF

Abstract:Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of this http URL propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

Comments:	17 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2304.01089 [cs.CL]
	(or arXiv:2304.01089v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.01089

Submission history

From: Zhihang Yuan [view email]
[v1] Mon, 3 Apr 2023 15:46:15 UTC (5,741 KB)
[v2] Thu, 6 Apr 2023 15:51:17 UTC (5,112 KB)
[v3] Tue, 25 Apr 2023 06:29:00 UTC (5,112 KB)
[v4] Wed, 17 May 2023 10:07:33 UTC (5,117 KB)

Computer Science > Computation and Language

Title:RPTQ: Reorder-based Post-training Quantization for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:RPTQ: Reorder-based Post-training Quantization for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators