LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Zhang, Mingyang; Chen, Hao; Shen, Chunhua; Yang, Zhen; Ou, Linlin; Yu, Xinyi; Zhuang, Bohan

Computer Science > Machine Learning

arXiv:2305.18403v3 (cs)

[Submitted on 28 May 2023 (v1), revised 3 Oct 2023 (this version, v3), latest version 7 Aug 2024 (v5)]

Title:LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Authors:Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang

View PDF

Abstract:Large pre-trained models (LPMs), such as LLaMA and GLM, have shown exceptional performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) has emerged to cheaply fine-tune these LPMs on downstream tasks, their deployment is still hindered by the vast model scale and computational costs. Neural network pruning offers a way to compress LPMs. However, the current pruning methods designed for LPMs are not compatible with LoRA. This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead. To this end, we propose LoRAPrune, a new framework that delivers an accurate, compact model for efficient inference in a highly memory-effective manner. Specifically, we first design a LoRA-guided pruning criterion, which uses the weights and gradients of LoRA, rather than the gradients of pre-trained weights for importance estimation. We then propose a structured iterative pruning procedure, to remove redundant channels and heads. Extensive experimental results demonstrate the superior performance of our LoRAPrune over existing approaches on the LLaMA series models. For instance, at a 50\% compression rate, LoRAPrune outperforms LLM-Pruner by a perplexity reduction of 8.0 on WikiText2 and 16.05 on PTB datasets, while concurrently reducing memory usage by 52.6\%. The code will be released after review

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.18403 [cs.LG]
	(or arXiv:2305.18403v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.18403

Submission history

From: Mingyang Zhang [view email]
[v1] Sun, 28 May 2023 15:15:48 UTC (240 KB)
[v2] Wed, 31 May 2023 22:32:19 UTC (243 KB)
[v3] Tue, 3 Oct 2023 12:51:55 UTC (353 KB)
[v4] Thu, 20 Jun 2024 06:31:00 UTC (367 KB)
[v5] Wed, 7 Aug 2024 03:30:30 UTC (367 KB)

Computer Science > Machine Learning

Title:LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators