Reassessing Layer Pruning in LLMs: New Insights and Methods

Lu, Yao; Cheng, Hao; Fang, Yujie; Wang, Zeyu; Wei, Jiaheng; Xu, Dongwei; Xuan, Qi; Yang, Xiaoniu; Zhu, Zhaowei

Computer Science > Machine Learning

arXiv:2411.15558 (cs)

[Submitted on 23 Nov 2024]

Title:Reassessing Layer Pruning in LLMs: New Insights and Methods

Authors:Yao Lu, Hao Cheng, Yujie Fang, Zeyu Wang, Jiaheng Wei, Dongwei Xu, Qi Xuan, Xiaoniu Yang, Zhaowei Zhu

View PDF HTML (experimental)

Abstract:Although large language models (LLMs) have achieved remarkable success across various domains, their considerable scale necessitates substantial computational resources, posing significant challenges for deployment in resource-constrained environments. Layer pruning, as a simple yet effective compression method, removes layers of a model directly, reducing computational overhead. However, what are the best practices for layer pruning in LLMs? Are sophisticated layer selection metrics truly effective? Does the LoRA (Low-Rank Approximation) family, widely regarded as a leading method for pruned model fine-tuning, truly meet expectations when applied to post-pruning fine-tuning? To answer these questions, we dedicate thousands of GPU hours to benchmarking layer pruning in LLMs and gaining insights across multiple dimensions. Our results demonstrate that a simple approach, i.e., pruning the final 25\% of layers followed by fine-tuning the \texttt{lm\_head} and the remaining last three layer, yields remarkably strong performance. Following this guide, we prune Llama-3.1-8B-It and obtain a model that outperforms many popular LLMs of similar size, such as ChatGLM2-6B, Vicuna-7B-v1.5, Qwen1.5-7B and Baichuan2-7B. We release the optimal model weights on Huggingface, and the code is available on GitHub.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.15558 [cs.LG]
	(or arXiv:2411.15558v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.15558

Submission history

From: Yao Lu [view email]
[v1] Sat, 23 Nov 2024 13:31:16 UTC (343 KB)

Computer Science > Machine Learning

Title:Reassessing Layer Pruning in LLMs: New Insights and Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reassessing Layer Pruning in LLMs: New Insights and Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators