Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Huang, Yi; Kadav, Asim; Lai, Farley; Patel, Deep; Graf, Hans Peter

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.09539 (cs)

[Submitted on 16 May 2023]

Title:Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Authors:Yi Huang, Asim Kadav, Farley Lai, Deep Patel, Hans Peter Graf

View PDF

Abstract:Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation, and may capture scene context using various modalities that further increases compute costs. Efficient methods such as those used for AR/VR often only use human-keypoint information but suffer from a loss of scene context that hurts accuracy. In this paper, we describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition. Specifically, KeyNet introduces the use of object based keypoint information to capture context in the scene. Our method illustrates how to build a structured intermediate representation that allows modeling higher-order interactions in the scene from object and human keypoints without using any RGB information. We find that KeyNet is able to track and classify human actions at just 5 FPS. More importantly, we demonstrate that object keypoints can be modeled to recover any loss in context from using keypoint information over AVA action and Kinetics datasets.

Comments:	SRVU - ICCV' 2021 workshop
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.09539 [cs.CV]
	(or arXiv:2305.09539v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.09539

Submission history

From: Deep Patel [view email]
[v1] Tue, 16 May 2023 15:30:33 UTC (1,909 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators