On the Duality between Gradient Transformations and Adapters

Torroba-Hennigen, Lucas; Lang, Hunter; Guo, Han; Kim, Yoon

Computer Science > Machine Learning

arXiv:2502.13811 (cs)

[Submitted on 19 Feb 2025]

Title:On the Duality between Gradient Transformations and Adapters

Authors:Lucas Torroba-Hennigen, Hunter Lang, Han Guo, Yoon Kim

View PDF HTML (experimental)

Abstract:We study memory-efficient optimization of neural networks with linear gradient transformations, where the gradients are linearly mapped to a lower dimensional space than the full parameter space, thus saving memory required for gradient accumulation and optimizer state persistence. The model parameters are updated by first performing an optimization step in the lower dimensional space and then going back into the original parameter space via the linear map's transpose. We show that optimizing the model in this transformed space is equivalent to reparameterizing the original model through a linear adapter that additively modifies the model parameters, and then only optimizing the adapter's parameters. When the transformation is Kronecker-factored, this establishes an equivalence between GaLore and one-sided LoRA. We show that this duality between gradient transformations and adapter-based reparameterizations unifies existing approaches to memory-efficient training and suggests new techniques for improving training efficiency and memory use.

Comments:	17 pages, 2 figures
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2502.13811 [cs.LG]
	(or arXiv:2502.13811v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.13811

Submission history

From: Lucas Torroba Hennigen [view email]
[v1] Wed, 19 Feb 2025 15:26:18 UTC (537 KB)

Computer Science > Machine Learning

Title:On the Duality between Gradient Transformations and Adapters

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Duality between Gradient Transformations and Adapters

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators