ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

Li, Peiming; Wang, Ziyi; Liu, Mengyuan; Liu, Hong; Chen, Chen

doi:10.1145/3664647.3680597

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.19370 (cs)

[Submitted on 28 Jul 2024]

Title:ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

Authors:Peiming Li, Ziyi Wang, Mengyuan Liu, Hong Liu, Chen Chen

View PDF HTML (experimental)

Abstract:Grasp generation aims to create complex hand-object interactions with a specified object. While traditional approaches for hand generation have primarily focused on visibility and diversity under scene constraints, they tend to overlook the fine-grained hand-object interactions such as contacts, resulting in inaccurate and undesired grasps. To address these challenges, we propose a controllable grasp generation task and introduce ClickDiff, a controllable conditional generation model that leverages a fine-grained Semantic Contact Map (SCM). Particularly when synthesizing interactive grasps, the method enables the precise control of grasp synthesis through either user-specified or algorithmically predicted Semantic Contact Map. Specifically, to optimally utilize contact supervision constraints and to accurately model the complex physical structure of hands, we propose a Dual Generation Framework. Within this framework, the Semantic Conditional Module generates reasonable contact maps based on fine-grained contact information, while the Contact Conditional Module utilizes contact maps alongside object point clouds to generate realistic grasps. We evaluate the evaluation criteria applicable to controllable grasp generation. Both unimanual and bimanual generation experiments on GRAB and ARCTIC datasets verify the validity of our proposed method, demonstrating the efficacy and robustness of ClickDiff, even with previously unseen objects. Our code is available at this https URL.

Comments:	ACM Multimedia 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.19370 [cs.CV]
	(or arXiv:2407.19370v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.19370
Related DOI:	https://doi.org/10.1145/3664647.3680597

Submission history

From: Li Peiming [view email]
[v1] Sun, 28 Jul 2024 02:42:29 UTC (4,330 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators