VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Wan, Zhaoliang; Ling, Yonggen; Yi, Senlin; Qi, Lu; Lee, Wangwei; Lu, Minglei; Yang, Sicheng; Teng, Xiao; Lu, Peng; Yang, Xu; Yang, Ming-Hsuan; Cheng, Hui

Computer Science > Robotics

arXiv:2501.00510 (cs)

[Submitted on 31 Dec 2024 (v1), last revised 6 Jan 2025 (this version, v2)]

Title:VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Authors:Zhaoliang Wan, Yonggen Ling, Senlin Yi, Lu Qi, Wangwei Lee, Minglei Lu, Sicheng Yang, Xiao Teng, Peng Lu, Xu Yang, Ming-Hsuan Yang, Hui Cheng

View PDF HTML (experimental)

Abstract:This paper addresses the scarcity of large-scale datasets for accurate object-in-hand pose estimation, which is crucial for robotic in-hand manipulation within the ``Perception-Planning-Control" paradigm. Specifically, we introduce VinT-6D, the first extensive multi-modal dataset integrating vision, touch, and proprioception, to enhance robotic manipulation. VinT-6D comprises 2 million VinT-Sim and 0.1 million VinT-Real splits, collected via simulations in MuJoCo and Blender and a custom-designed real-world platform. This dataset is tailored for robotic hands, offering models with whole-hand tactile perception and high-quality, well-aligned data. To the best of our knowledge, the VinT-Real is the largest considering the collection difficulties in the real-world environment so that it can bridge the gap of simulation to real compared to the previous works. Built upon VinT-6D, we present a benchmark method that shows significant improvements in performance by fusing multi-modal information. The project is available at this https URL.

Subjects:	Robotics (cs.RO)
Cite as:	arXiv:2501.00510 [cs.RO]
	(or arXiv:2501.00510v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2501.00510

Submission history

From: Zhaoliang Wan [view email]
[v1] Tue, 31 Dec 2024 15:45:09 UTC (29,226 KB)
[v2] Mon, 6 Jan 2025 16:04:53 UTC (29,226 KB)

Computer Science > Robotics

Title:VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators