Memory Efficient Optimizers with 4-bit States

Li, Bingrui; Chen, Jianfei; Zhu, Jun

Computer Science > Machine Learning

arXiv:2309.01507 (cs)

[Submitted on 4 Sep 2023 (v1), last revised 27 Oct 2023 (this version, v3)]

Title:Memory Efficient Optimizers with 4-bit States

Authors:Bingrui Li, Jianfei Chen, Jun Zhu

View PDF

Abstract:Optimizer states are a major source of memory consumption for training neural networks, limiting the maximum trainable model within given memory budget. Compressing the optimizer states from 32-bit floating points to lower bitwidth is promising to reduce the training memory footprint, while the current lowest achievable bitwidth is 8-bit. In this work, we push optimizer states bitwidth down to 4-bit through a detailed empirical analysis of first and second moments. Specifically, we find that moments have complicated outlier patterns, that current block-wise quantization cannot accurately approximate. We use a smaller block size and propose to utilize both row-wise and column-wise information for better quantization. We further identify a zero point problem of quantizing the second moment, and solve this problem with a linear quantizer that excludes the zero point. Our 4-bit optimizers are evaluated on a wide variety of benchmarks including natural language understanding, machine translation, image classification, and instruction tuning. On all the tasks our optimizers can achieve comparable accuracy with their full-precision counterparts, while enjoying better memory efficiency.

Comments:	v3: camera ready revisions for NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.01507 [cs.LG]
	(or arXiv:2309.01507v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.01507

Submission history

From: Bingrui Li [view email]
[v1] Mon, 4 Sep 2023 10:27:17 UTC (45,833 KB)
[v2] Wed, 6 Sep 2023 15:06:46 UTC (45,843 KB)
[v3] Fri, 27 Oct 2023 06:24:08 UTC (45,843 KB)

Computer Science > Machine Learning

Title:Memory Efficient Optimizers with 4-bit States

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Memory Efficient Optimizers with 4-bit States

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators