DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Hu, Xiaolin; Cheng, Xiang; Liu, Peiyu; Liu, Wei; Luan, Jian; Wang, Bin; Liu, Yong

Computer Science > Computation and Language

arXiv:2412.20891 (cs)

[Submitted on 30 Dec 2024]

Title:DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Authors:Xiaolin Hu, Xiang Cheng, Peiyu Liu, Wei Liu, Jian Luan, Bin Wang, Yong Liu

View PDF HTML (experimental)

Abstract:Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in fine-tuning LLMs. Additionally, we introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization. Experiments on commonsense and arithmetic reasoning tasks show that DoTA outperforms random initialization methods with fewer parameters. QDoTA further reduces memory consumption and achieves comparable performance to DoTA on commonsense reasoning tasks. We will release our code to support future research.

Comments:	12 pages, 6 figures
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2412.20891 [cs.CL]
	(or arXiv:2412.20891v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.20891

Submission history

From: Xiaolin Hu [view email]
[v1] Mon, 30 Dec 2024 12:00:47 UTC (247 KB)

Computer Science > Computation and Language

Title:DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators