APLA: A Simple Adaptation Method for Vision Transformers

Sorkhei, Moein; Konuk, Emir; Smith, Kevin; Matsoukas, Christos

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.11335 (cs)

[Submitted on 14 Mar 2025 (v1), last revised 24 Mar 2025 (this version, v2)]

Title:APLA: A Simple Adaptation Method for Vision Transformers

Authors:Moein Sorkhei, Emir Konuk, Kevin Smith, Christos Matsoukas

View PDF HTML (experimental)

Abstract:Existing adaptation techniques typically require architectural modifications or added parameters, leading to high computational costs and complexity. We introduce Attention Projection Layer Adaptation (APLA), a simple approach to adapt vision transformers (ViTs) without altering the architecture or adding parameters. Through a systematic analysis, we find that the layer immediately after the attention mechanism is crucial for adaptation. By updating only this projection layer, or even just a random subset of this layer's weights, APLA achieves state-of-the-art performance while reducing GPU memory usage by up to 52.63% and training time by up to 43.0%, with no extra cost at inference. Across 46 datasets covering a variety of tasks including scene classification, medical imaging, satellite imaging, and fine-grained classification, APLA consistently outperforms 17 other leading adaptation methods, including full fine-tuning, on classification, segmentation, and detection tasks. The code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.11335 [cs.CV]
	(or arXiv:2503.11335v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.11335

Submission history

From: Moein Sorkhei [view email]
[v1] Fri, 14 Mar 2025 12:03:29 UTC (21,763 KB)
[v2] Mon, 24 Mar 2025 10:10:38 UTC (21,763 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:APLA: A Simple Adaptation Method for Vision Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:APLA: A Simple Adaptation Method for Vision Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators