Multi-view Hand Reconstruction with a Point-Embedded Transformer

Yang, Lixin; Zhong, Licheng; Zhu, Pengxiang; Zhan, Xinyu; Kong, Junxiao; Xu, Jian; Lu, Cewu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.10581 (cs)

[Submitted on 20 Aug 2024]

Title:Multi-view Hand Reconstruction with a Point-Embedded Transformer

Authors:Lixin Yang, Licheng Zhong, Pengxiang Zhu, Xinyu Zhan, Junxiao Kong, Jian Xu, Cewu Lu

View PDF HTML (experimental)

Abstract:This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form of 3D information and serves as an ideal medium for fusing features across different views, given its varied projections across these views. Consequently, our method harnesses a simple yet effective idea: a complex 3D hand mesh can be represented by a set of 3D basis points that 1) are embedded in the multi-view stereo, 2) carry features from the multi-view images, and 3) encompass the hand in it. The second advance lies in the training strategy. We utilize a combination of five large-scale multi-view datasets and employ randomization in the number, order, and poses of the cameras. By processing such a vast amount of data and a diverse array of camera configurations, our model demonstrates notable generalizability in the real-world applications. As a result, POEM presents a highly practical, plug-and-play solution that enables user-friendly, cost-effective multi-view motion capture for both left and right hands. The model and source codes are available at this https URL.

Comments:	Generalizable multi-view Hand Mesh Reconstruction (HMR) model. Extension of the original work at CVPR2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.10581 [cs.CV]
	(or arXiv:2408.10581v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.10581

Submission history

From: Lixin Yang [view email]
[v1] Tue, 20 Aug 2024 06:42:17 UTC (28,122 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-view Hand Reconstruction with a Point-Embedded Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-view Hand Reconstruction with a Point-Embedded Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators