Sparse is Enough in Fine-tuning Pre-trained Large Language Model

Song, Weixi; Li, Zuchao; Zhang, Lefei; Zhao, Hai; Du, Bo

Computer Science > Machine Learning

arXiv:2312.11875v1 (cs)

[Submitted on 19 Dec 2023 (this version), latest version 8 Jun 2024 (v3)]

Title:Sparse is Enough in Fine-tuning Pre-trained Large Language Model

Authors:Weixi Song, Zuchao Li, Lefei Zhang, Hai Zhao, Bo Du

View PDF HTML (experimental)

Abstract:With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation, including Adapters, Bia-only, and the recently widely used Low-Rank Adaptation. Although these methods have demonstrated their effectiveness to some extent and have been widely applied, the underlying principles are still unclear. In this paper, we reveal the transition of loss landscape in the downstream domain from random initialization to pre-trained initialization, that is, from low-amplitude oscillation to high-amplitude oscillation. The parameter gradients exhibit a property akin to sparsity, where a small fraction of components dominate the total gradient norm, for instance, 1% of the components account for 99% of the gradient. This property ensures that the pre-trained model can easily find a flat minimizer which guarantees the model's ability to generalize even with a low number of trainable parameters. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at this https URL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2312.11875 [cs.LG]
	(or arXiv:2312.11875v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.11875

Submission history

From: Weixi Song [view email]
[v1] Tue, 19 Dec 2023 06:06:30 UTC (915 KB)
[v2] Thu, 2 May 2024 16:25:46 UTC (1,315 KB)
[v3] Sat, 8 Jun 2024 03:29:17 UTC (1,318 KB)

Computer Science > Machine Learning

Title:Sparse is Enough in Fine-tuning Pre-trained Large Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sparse is Enough in Fine-tuning Pre-trained Large Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators