Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Huang, Xijie; Shen, Zhiqiang; Dong, Pingcheng; Cheng, Kwang-Ting

Computer Science > Machine Learning

arXiv:2307.00331 (cs)

[Submitted on 1 Jul 2023 (v1), last revised 12 Oct 2024 (this version, v2)]

Title:Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Authors:Xijie Huang, Zhiqiang Shen, Pingcheng Dong, Kwang-Ting Cheng

View PDF HTML (experimental)

Abstract:Despite the outstanding performance of transformers in both language and vision tasks, the expanding computation and model size have increased the demand for efficient deployment. To address the heavy computation and parameter drawbacks, quantization is frequently studied in the community as a representative model compression technique and has seen extensive use on ConvNets. However, due to the unique properties of transformers, the low-bit quantization applications are still limited and underexplored. In this paper, we identify the difficulty of transformer low-bit quantization-aware training on its unique variation behaviors, which significantly differ from ConvNets. Based on comprehensive quantitative analysis, we observe variation in three hierarchies: various module quantization sensitivities, outliers in static weight and activation distribution, and oscillation in dynamic parameter fluctuations. These variations of transformers bring instability to the quantization-aware training (QAT) and negatively influence the performance. We explore the best practices to alleviate the variation's influence during low-bit transformer QAT and propose a variation-aware quantization scheme for both vision and language transformers. We extensively verify and show our scheme can alleviate the variation and improve the performance of transformers across various models and tasks. Our solution substantially improves the 2-bit Swin-T and binary BERT-base, achieving a 3.35% and 1.4% accuracy improvement over previous state-of-the-art methods on ImageNet-1K and GLUE. Codes and models are available at this https URL.

Comments:	Accepted by TMLR, Code: this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.00331 [cs.LG]
	(or arXiv:2307.00331v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.00331

Submission history

From: Xijie Huang [view email]
[v1] Sat, 1 Jul 2023 13:01:39 UTC (979 KB)
[v2] Sat, 12 Oct 2024 17:53:00 UTC (1,047 KB)

Computer Science > Machine Learning

Title:Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators