DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Truong, Thanh-Dat; Bui, Quoc-Huy; Duong, Chi Nhan; Seo, Han-Seok; Phung, Son Lam; Li, Xin; Luu, Khoa

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.10233 (cs)

[Submitted on 19 Mar 2022]

Title:DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Authors:Thanh-Dat Truong, Quoc-Huy Bui, Chi Nhan Duong, Han-Seok Seo, Son Lam Phung, Xin Li, Khoa Luu

View PDF

Abstract:Human action recognition has recently become one of the popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results. However, these methods have suffered some fundamental limitations such as lack of robustness and generalization, e.g., how does the temporal ordering of video frames affect the recognition results? This work presents a novel end-to-end Transformer-based Directed Attention (DirecFormer) framework for robust action recognition. The method takes a simple but novel perspective of Transformer-based approach to understand the right order of sequence actions. Therefore, the contributions of this work are three-fold. Firstly, we introduce the problem of ordered temporal learning issues to the action recognition problem. Secondly, a new Directed Attention mechanism is introduced to understand and provide attentions to human actions in the right order. Thirdly, we introduce the conditional dependency in action sequence modeling that includes orders and classes. The proposed approach consistently achieves the state-of-the-art (SOTA) results compared with the recent action recognition methods, on three standard large-scale benchmarks, i.e. Jester, Kinetics-400 and Something-Something-V2.

Comments:	Accepted to CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.10233 [cs.CV]
	(or arXiv:2203.10233v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.10233

Submission history

From: Thanh-Dat Truong [view email]
[v1] Sat, 19 Mar 2022 03:41:48 UTC (6,344 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators