Leveraging Registers in Vision Transformers for Robust Adaptation

Yellapragada, Srikar; Thopalli, Kowshik; Narayanaswamy, Vivek; Sakla, Wesam; Liu, Yang; Mubarka, Yamen; Samaras, Dimitris; Thiagarajan, Jayaraman J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.04784 (cs)

[Submitted on 8 Jan 2025]

Title:Leveraging Registers in Vision Transformers for Robust Adaptation

Authors:Srikar Yellapragada, Kowshik Thopalli, Vivek Narayanaswamy, Wesam Sakla, Yang Liu, Yamen Mubarka, Dimitris Samaras, Jayaraman J. Thiagarajan

View PDF HTML (experimental)

Abstract:Vision Transformers (ViTs) have shown success across a variety of tasks due to their ability to capture global image representations. Recent studies have identified the existence of high-norm tokens in ViTs, which can interfere with unsupervised object discovery. To address this, the use of "registers" which are additional tokens that isolate high norm patch tokens while capturing global image-level information has been proposed. While registers have been studied extensively for object discovery, their generalization properties particularly in out-of-distribution (OOD) scenarios, remains underexplored. In this paper, we examine the utility of register token embeddings in providing additional features for improving generalization and anomaly rejection. To that end, we propose a simple method that combines the special CLS token embedding commonly employed in ViTs with the average-pooled register embeddings to create feature representations which are subsequently used for training a downstream classifier. We find that this enhances OOD generalization and anomaly rejection, while maintaining in-distribution (ID) performance. Extensive experiments across multiple ViT backbones trained with and without registers reveal consistent improvements of 2-4\% in top-1 OOD accuracy and a 2-3\% reduction in false positive rates for anomaly detection. Importantly, these gains are achieved without additional computational overhead.

Comments:	Accepted at ICASSP 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2501.04784 [cs.CV]
	(or arXiv:2501.04784v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.04784

Submission history

From: Srikar Yellapragada [view email]
[v1] Wed, 8 Jan 2025 19:02:32 UTC (485 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Registers in Vision Transformers for Robust Adaptation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Registers in Vision Transformers for Robust Adaptation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators