A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Mascaro, Esteve Valls; Ahn, Hyemin; Lee, Dongheui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.07301 (cs)

[Submitted on 14 Aug 2023 (v1), last revised 8 Apr 2024 (this version, v2)]

Title:A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Authors:Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee

View PDF HTML (experimental)

Abstract:The synthesis of human motion has traditionally been addressed through task-dependent models that focus on specific challenges, such as predicting future motions or filling in intermediate poses conditioned on known key-poses. In this paper, we present a novel task-independent model called UNIMASK-M, which can effectively address these challenges using a unified architecture. Our model obtains comparable or better performance than the state-of-the-art in each field. Inspired by Vision Transformers (ViTs), our UNIMASK-M model decomposes a human pose into body parts to leverage the spatio-temporal relationships existing in human motion. Moreover, we reformulate various pose-conditioned motion synthesis tasks as a reconstruction problem with different masking patterns given as input. By explicitly informing our model about the masked joints, our UNIMASK-M becomes more robust to occlusions. Experimental results show that our model successfully forecasts human motion on the Human3.6M dataset. Moreover, it achieves state-of-the-art results in motion inbetweening on the LaFAN1 dataset, particularly in long transition periods. More information can be found on the project website this https URL

Comments:	Accepted to AAAI2024. Webpage: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
Cite as:	arXiv:2308.07301 [cs.CV]
	(or arXiv:2308.07301v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.07301

Submission history

From: Esteve Valls Mascaró [view email]
[v1] Mon, 14 Aug 2023 17:39:44 UTC (9,895 KB)
[v2] Mon, 8 Apr 2024 15:47:20 UTC (9,895 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators