PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Meng, Yu; Li, Kaiyuan; Huang, Chenran; Gao, Chen; Chen, Xinlei; Li, Yong; Zhang, Xiaoping

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.14504 (cs)

[Submitted on 20 Feb 2025]

Title:PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Authors:Yu Meng, Kaiyuan Li, Chenran Huang, Chen Gao, Xinlei Chen, Yong Li, Xiaoping Zhang

View PDF HTML (experimental)

Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a range of multimodal tasks. However, their inference efficiency is constrained by the large number of visual tokens processed during decoding. To address this challenge, we propose Per-Layer Per-Head Vision Token Pruning (PLPHP), a two-level fine-grained pruning method including Layer-Level Retention Rate Allocation and Head-Level Vision Token Pruning. Motivated by the Vision Token Re-attention phenomenon across decoder layers, we dynamically adjust token retention rates layer by layer. Layers that exhibit stronger attention to visual information preserve more vision tokens, while layers with lower vision attention are aggressively pruned. Furthermore, PLPHP applies pruning at the attention head level, enabling different heads within the same layer to independently retain critical context. Experiments on multiple benchmarks demonstrate that PLPHP delivers an 18% faster decoding speed and reduces the Key-Value Cache (KV Cache) size by over 50%, all at the cost of 0.46% average performance drop, while also achieving notable performance improvements in multi-image tasks. These results highlight the effectiveness of fine-grained token pruning and contribute to advancing the efficiency and scalability of LVLMs. Our source code will be made publicly available.

Comments:	12 pages, 8 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.14504 [cs.CV]
	(or arXiv:2502.14504v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.14504

Submission history

From: Yu Meng [view email]
[v1] Thu, 20 Feb 2025 12:31:31 UTC (10,296 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators