Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset

Zhang, Yuhong; Lin, Jing; Zeng, Ailing; Wu, Guanlin; Lu, Shunlin; Fu, Yurong; Cai, Yuanhao; Zhang, Ruimao; Wang, Haoqian; Zhang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.05098 (cs)

[Submitted on 9 Jan 2025]

Title:Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset

Authors:Yuhong Zhang, Jing Lin, Ailing Zeng, Guanlin Wu, Shunlin Lu, Yurong Fu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang

View PDF HTML (experimental)

Abstract:In this paper, we introduce Motion-X++, a large-scale multimodal 3D expressive whole-body human motion dataset. Existing motion datasets predominantly capture body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions, and are typically limited to lab settings with manually labeled text descriptions, thereby restricting their scalability. To address this issue, we develop a scalable annotation pipeline that can automatically capture 3D whole-body human motion and comprehensive textural labels from RGB videos and build the Motion-X dataset comprising 81.1K text-motion pairs. Furthermore, we extend Motion-X into Motion-X++ by improving the annotation pipeline, introducing more data modalities, and scaling up the data quantities. Motion-X++ provides 19.5M 3D whole-body pose annotations covering 120.5K motion sequences from massive scenes, 80.8K RGB videos, 45.3K audios, 19.5M frame-level whole-body pose descriptions, and 120.5K sequence-level semantic labels. Comprehensive experiments validate the accuracy of our annotation pipeline and highlight Motion-X++'s significant benefits for generating expressive, precise, and natural motion with paired multimodal labels supporting several downstream tasks, including text-driven whole-body motion generation,audio-driven motion generation, 3D whole-body human mesh recovery, and 2D whole-body keypoints estimation, etc.

Comments:	17 pages, 14 figures, This work extends and enhances the research published in the NeurIPS 2023 paper, "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset". arXiv admin note: substantial text overlap with arXiv:2307.00818
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.05098 [cs.CV]
	(or arXiv:2501.05098v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.05098

Submission history

From: Yuhong Zhang [view email]
[v1] Thu, 9 Jan 2025 09:37:27 UTC (35,284 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators