OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Lin, Dixuan; Zhang, Yuxiang; Li, Mengcheng; Liu, Yebin; Jing, Wei; Yan, Qi; Wang, Qianying; Zhang, Hongwen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.20330 (cs)

[Submitted on 30 May 2024 (v1), last revised 1 Oct 2024 (this version, v3)]

Title:OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Authors:Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Yebin Liu, Wei Jing, Qi Yan, Qianying Wang, Hongwen Zhang

View PDF HTML (experimental)

Abstract:In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: this https URL.

Comments:	An extended journal version of 4DHands, featured with versatile module that can adapt to temporal task and multi-view task. Additional detailed comparison experiments and results presentation have been added. More demo videos can be seen at our project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2405.20330 [cs.CV]
	(or arXiv:2405.20330v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.20330

Submission history

From: Dixuan Lin [view email]
[v1] Thu, 30 May 2024 17:59:02 UTC (5,589 KB)
[v2] Fri, 31 May 2024 10:52:56 UTC (5,589 KB)
[v3] Tue, 1 Oct 2024 15:04:23 UTC (10,520 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators