Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Zhang, Mingfang; Huang, Yifei; Liu, Ruicong; Sato, Yoichi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.06628 (cs)

[Submitted on 9 Jul 2024]

Title:Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Authors:Mingfang Zhang, Yifei Huang, Ruicong Liu, Yoichi Sato

View PDF HTML (experimental)

Abstract:Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric action recognition, the potential of IMUs remains under-explored. In this work, we present a novel method for action recognition that integrates motion data from body-worn IMUs with egocentric video. Due to the scarcity of labeled multimodal data, we design an MAE-based self-supervised pretraining method, obtaining strong multi-modal representations via modeling the natural correlation between visual and motion signals. To model the complex relation of multiple IMU devices placed across the body, we exploit the collaborative dynamics in multiple IMU devices and propose to embed the relative motion features of human joints into a graph structure. Experiments show our method can achieve state-of-the-art performance on multiple public datasets. The effectiveness of our MAE-based pretraining and graph-based IMU modeling are further validated by experiments in more challenging scenarios, including partially missing IMU devices and video quality corruption, promoting more flexible usages in the real world.

Comments:	ECCV 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.06628 [cs.CV]
	(or arXiv:2407.06628v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.06628

Submission history

From: Mingfang Zhang [view email]
[v1] Tue, 9 Jul 2024 07:53:16 UTC (1,129 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators