Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

Rangrej, Samrudhdhi B.; Srinidhi, Chetan L.; Clark, James J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.00656 (cs)

[Submitted on 1 Apr 2022]

Title:Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

Authors:Samrudhdhi B. Rangrej, Chetan L. Srinidhi, James J. Clark

View PDF

Abstract:Most hard attention models initially observe a complete scene to locate and sense informative glimpses, and predict class-label of a scene based on glimpses. However, in many applications (e.g., aerial imaging), observing an entire scene is not always feasible due to the limited time and resources available for acquisition. In this paper, we develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image and predicts informative glimpse locations solely based on past glimpses. We design our agent using DeiT-distilled and train it with a one-step actor-critic algorithm. Furthermore, to improve classification performance, we introduce a novel training objective, which enforces consistency between the class distribution predicted by a teacher model from a complete image and the class distribution predicted by our agent using glimpses. When the agent senses only 4% of the total image area, the inclusion of the proposed consistency loss in our training objective yields 3% and 8% higher accuracy on ImageNet and fMoW datasets, respectively. Moreover, our agent outperforms previous state-of-the-art by observing nearly 27% and 42% fewer pixels in glimpses on ImageNet and fMoW.

Comments:	Accepted to CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2204.00656 [cs.CV]
	(or arXiv:2204.00656v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.00656

Submission history

From: Samrudhdhi Bharatkumar Rangrej [view email]
[v1] Fri, 1 Apr 2022 18:51:55 UTC (9,362 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators