Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

George, Robert Joseph; Pitt, David; Zhao, Jiawei; Kossaifi, Jean; Luo, Cheng; Tian, Yuandong; Anandkumar, Anima

Computer Science > Machine Learning

arXiv:2501.02379 (cs)

[Submitted on 4 Jan 2025]

Title:Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

Authors:Robert Joseph George, David Pitt, Jiawei Zhao, Jean Kossaifi, Cheng Luo, Yuandong Tian, Anima Anandkumar

View PDF HTML (experimental)

Abstract:We present Tensor-GaLore, a novel method for efficient training of neural networks with higher-order tensor weights. Many models, particularly those used in scientific computing, employ tensor-parameterized layers to capture complex, multidimensional relationships. When scaling these methods to high-resolution problems makes memory usage grow intractably, and matrix based optimization methods lead to suboptimal performance and compression. We propose to work directly in the high-order space of the complex tensor parameter space using a tensor factorization of the gradients during optimization. We showcase its effectiveness on Fourier Neural Operators (FNOs), a class of models crucial for solving partial differential equations (PDE) and prove the theory of it. Across various PDE tasks like the Navier Stokes and Darcy Flow equations, Tensor-GaLore achieves substantial memory savings, reducing optimizer memory usage by up to 75%. These substantial memory savings across AI for science demonstrate Tensor-GaLore's potential.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2501.02379 [cs.LG]
	(or arXiv:2501.02379v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.02379

Submission history

From: Robert Joseph George [view email]
[v1] Sat, 4 Jan 2025 20:51:51 UTC (278 KB)

Computer Science > Machine Learning

Title:Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators