SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Liang, Zhixuan; Mu, Yao; Ma, Hengbo; Tomizuka, Masayoshi; Ding, Mingyu; Luo, Ping

Computer Science > Robotics

arXiv:2312.11598 (cs)

[Submitted on 18 Dec 2023 (v1), last revised 28 Mar 2024 (this version, v3)]

Title:SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Authors:Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo

View PDF HTML (experimental)

Abstract:Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion planning to address this problem. At the higher level, the skill abstraction module learns discrete, human-understandable skill representations from visual observations and language instructions. These learned skill embeddings are then used to condition the diffusion model to generate customized latent trajectories aligned with the skills. This allows generating diverse state trajectories that adhere to the learnable skills. By integrating skill learning with conditional trajectory generation, SkillDiffuser produces coherent behavior following abstract instructions across diverse tasks. Experiments on multi-task robotic manipulation benchmarks like Meta-World and LOReL demonstrate state-of-the-art performance and human-interpretable skill representations from SkillDiffuser. More visualization results and information could be found on our website.

Comments:	Accepted by CVPR 2024. Camera ready version. Project page: this https URL
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2312.11598 [cs.RO]
	(or arXiv:2312.11598v3 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2312.11598

Submission history

From: Zhixuan Liang [view email]
[v1] Mon, 18 Dec 2023 18:16:52 UTC (3,239 KB)
[v2] Wed, 13 Mar 2024 16:29:50 UTC (3,893 KB)
[v3] Thu, 28 Mar 2024 16:49:40 UTC (3,724 KB)

Computer Science > Robotics

Title:SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators