Vision-Language Interpreter for Robot Task Planning

Shirai, Keisuke; Beltran-Hernandez, Cristian C.; Hamaya, Masashi; Hashimoto, Atsushi; Tanaka, Shohei; Kawaharazuka, Kento; Tanaka, Kazutoshi; Ushiku, Yoshitaka; Mori, Shinsuke

Computer Science > Robotics

arXiv:2311.00967 (cs)

[Submitted on 2 Nov 2023 (v1), last revised 20 Feb 2024 (this version, v2)]

Title:Vision-Language Interpreter for Robot Task Planning

Authors:Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori

View PDF

Abstract:Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By generating PDs from language instruction and scene observation, we can drive symbolic planners in a language-guided framework. We propose a Vision-Language Interpreter (ViLaIn), a new framework that generates PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine generated PDs via error message feedback from the symbolic planner. Our aim is to answer the question: How accurately can ViLaIn and the symbolic planner generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset called the problem description generation (ProDG) dataset. The framework is evaluated with four new evaluation metrics. Experimental results show that ViLaIn can generate syntactically correct problems with more than 99\% accuracy and valid plans with more than 58\% accuracy. Our code and dataset are available at this https URL.

Comments:	ICRA 2024
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2311.00967 [cs.RO]
	(or arXiv:2311.00967v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2311.00967

Submission history

From: Keisuke Shirai [view email]
[v1] Thu, 2 Nov 2023 03:32:30 UTC (3,955 KB)
[v2] Tue, 20 Feb 2024 03:13:30 UTC (3,955 KB)

Computer Science > Robotics

Title:Vision-Language Interpreter for Robot Task Planning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Vision-Language Interpreter for Robot Task Planning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators