Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Hubara, Itay; Nahshan, Yury; Hanani, Yair; Banner, Ron; Soudry, Daniel

Computer Science > Machine Learning

arXiv:2006.10518v1 (cs)

[Submitted on 14 Jun 2020 (this version), latest version 14 Dec 2020 (v2)]

Title:Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Authors:Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry

View PDF

Abstract:Most of the literature on neural network quantization requires some training of the quantized model (fine-tuning). However, this training is not always possible in real-world scenarios, as it requires the full dataset. Lately, post-training quantization methods have gained considerable attention, as they are simple to use and require only a small, unlabeled calibration set. Yet, they usually incur significant accuracy degradation when quantized below 8-bits. This paper seeks to address this problem by introducing two pipelines, advanced and light, where the former involves: (i) minimizing the quantization errors of each layer by optimizing its parameters over the calibration set; (ii) using integer programming to optimally allocate the desired bit-width for each layer while constraining accuracy degradation or model compression; and (iii) tuning the mixed-precision model statistics to correct biases introduced during quantization. While the light pipeline which invokes only (ii) and (iii) obtains surprisingly accurate results; the advanced pipeline yields state-of-the-art accuracy-compression ratios for both vision and text models. For instance, on ResNet50, we obtain less than 1% accuracy degradation while compressing the model to 13% of its original size. We open-sourced our code.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.10518 [cs.LG]
	(or arXiv:2006.10518v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.10518

Submission history

From: Itay Hubara [view email]
[v1] Sun, 14 Jun 2020 16:07:55 UTC (151 KB)
[v2] Mon, 14 Dec 2020 15:55:05 UTC (343 KB)

Computer Science > Machine Learning

Title:Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators