Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Yang, Yunhao; Neary, Cyrus; Topcu, Ufuk

doi:10.5555/3635637.3663065

Computer Science > Artificial Intelligence

arXiv:2308.05295v2 (cs)

[Submitted on 10 Aug 2023 (v1), last revised 17 Jun 2024 (this version, v2)]

Title:Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Authors:Yunhao Yang, Cyrus Neary, Ufuk Topcu

View PDF HTML (experimental)

Abstract:Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations with formal guarantees. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It allows formal verification of whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. Next, the algorithm leverages the vision and language capabilities of pretrained models to link the observations from the task environment to the text-based control logic from the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to provide probabilistic guarantees on whether the controller satisfies the user-provided specifications under perceptual uncertainties. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks.

Comments:	Accepted as full paper in AAMAS 2024
Subjects:	Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
Cite as:	arXiv:2308.05295 [cs.AI]
	(or arXiv:2308.05295v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2308.05295
Related DOI:	https://doi.org/10.5555/3635637.3663065

Submission history

From: Yunhao Yang [view email]
[v1] Thu, 10 Aug 2023 02:29:11 UTC (10,785 KB)
[v2] Mon, 17 Jun 2024 19:09:17 UTC (8,147 KB)

Computer Science > Artificial Intelligence

Title:Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators