Generative World Explorer

Lu, Taiming; Shu, Tianmin; Yuille, Alan; Khashabi, Daniel; Chen, Jieneng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.11844 (cs)

[Submitted on 18 Nov 2024 (v1), last revised 19 Nov 2024 (this version, v2)]

Title:Generative World Explorer

Authors:Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen

View PDF HTML (experimental)

Abstract:Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world state. In contrast, humans can $\textit{imagine}$ unseen parts of the world through a mental exploration and $\textit{revise}$ their beliefs with imagined observations. Such updated beliefs can allow them to make more informed decisions, without necessitating the physical exploration of the world at all times. To achieve this human-like ability, we introduce the $\textit{Generative World Explorer (Genex)}$, an egocentric world exploration framework that allows an agent to mentally explore a large-scale 3D world (e.g., urban scenes) and acquire imagined observations to update its belief. This updated belief will then help the agent to make a more informed decision at the current step. To train $\textit{Genex}$, we create a synthetic urban scene dataset, Genex-DB. Our experimental results demonstrate that (1) $\textit{Genex}$ can generate high-quality and consistent observations during long-horizon exploration of a large virtual physical world and (2) the beliefs updated with the generated observations can inform an existing decision-making model (e.g., an LLM agent) to make better plans.

Comments:	Website: this http URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2411.11844 [cs.CV]
	(or arXiv:2411.11844v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.11844

Submission history

From: TaiMing Lu [view email]
[v1] Mon, 18 Nov 2024 18:59:31 UTC (27,920 KB)
[v2] Tue, 19 Nov 2024 18:59:42 UTC (27,555 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generative World Explorer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generative World Explorer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators