Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Herzog, Jonas; Liu, Jiangpin; Wang, Yue

Abstract:Recent robotic task planning frameworks have integrated large multimodal models (LMMs) such as GPT-4V. To address grounding issues of such models, it has been suggested to split the pipeline into perceptional state grounding and subsequent state-based planning. As we show in this work, the state grounding ability of LMM-based approaches is still limited by weaknesses in granular, structured, domain-specific scene understanding. To address this shortcoming, we develop a more structured state grounding framework that features a domain-conditioned scene graph as its scene representation. We show that such representation is actionable in nature as it is directly mappable to a symbolic state in classical planning languages such as PDDL. We provide an instantiation of our state grounding framework where the domain-conditioned scene graph generation is implemented with a lightweight vision-language approach that classifies domain-specific predicates on top of domain-relevant object detections. Evaluated across three domains, our approach achieves significantly higher state estimation accuracy and task planning success rates compared to the previous LMM-based approaches.

Subjects:	Robotics (cs.RO)
Cite as:	arXiv:2504.06661 [cs.RO]
	(or arXiv:2504.06661v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2504.06661

Computer Science > Robotics

Title:Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators