ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

Yoa, Seungdong; Lee, Seungjun; Cho, Hyeseung; Kim, Bumsoo; Lim, Woohyung

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.16491 (cs)

[Submitted on 21 Dec 2024]

Title:ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

Authors:Seungdong Yoa, Seungjun Lee, Hyeseung Cho, Bumsoo Kim, Woohyung Lim

View PDF HTML (experimental)

Abstract:Vision Transformers (ViTs) have achieved remarkable success in various computer vision tasks. However, ViTs have a huge computational cost due to their inherent reliance on multi-head self-attention (MHSA), prompting efforts to accelerate ViTs for practical applications. To this end, recent works aim to reduce the number of tokens, mainly focusing on how to effectively prune or merge them. Nevertheless, since ViT tokens are generated from non-overlapping grid patches, they usually do not convey sufficient semantics, making it incompatible with efficient ViTs. To address this, we propose ImagePiece, a novel re-tokenization strategy for Vision Transformers. Following the MaxMatch strategy of NLP tokenization, ImagePiece groups semantically insufficient yet locally coherent tokens until they convey meaning. This simple retokenization is highly compatible with previous token reduction methods, being able to drastically narrow down relevant tokens, enhancing the inference speed of DeiT-S by 54% (nearly 1.5$\times$ faster) while achieving a 0.39% improvement in ImageNet classification accuracy. For hyper-speed inference scenarios (with 251% acceleration), our approach surpasses other baselines by an accuracy over 8%.

Comments:	Accepted to AAAI 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.16491 [cs.CV]
	(or arXiv:2412.16491v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.16491

Submission history

From: Seungdong Yoa [view email]
[v1] Sat, 21 Dec 2024 05:38:20 UTC (2,036 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators