CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

Liang, Xiwen; Ma, Liang; Guo, Shanshan; Han, Jianhua; Xu, Hang; Ma, Shikui; Liang, Xiaodan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.10322 (cs)

[Submitted on 17 Jun 2023 (v1), last revised 14 Mar 2024 (this version, v3)]

Title:CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

Authors:Xiwen Liang, Liang Ma, Shanshan Guo, Jianhua Han, Hang Xu, Shikui Ma, Xiaodan Liang

View PDF

Abstract:Understanding and following natural language instructions while navigating through complex, real-world environments poses a significant challenge for general-purpose robots. These environments often include obstacles and pedestrians, making it essential for autonomous agents to possess the capability of self-corrected planning to adjust their actions based on feedback from the surroundings. However, the majority of existing vision-and-language navigation (VLN) methods primarily operate in less realistic simulator settings and do not incorporate environmental feedback into their decision-making processes. To address this gap, we introduce a novel zero-shot framework called CorNav, utilizing a large language model for decision-making and comprising two key components: 1) incorporating environmental feedback for refining future plans and adjusting its actions, and 2) multiple domain experts for parsing instructions, scene understanding, and refining predicted actions. In addition to the framework, we develop a 3D simulator that renders realistic scenarios using Unreal Engine 5. To evaluate the effectiveness and generalization of navigation agents in a zero-shot multi-task setting, we create a benchmark called NavBench. Extensive experiments demonstrate that CorNav consistently outperforms all baselines by a significant margin across all tasks. On average, CorNav achieves a success rate of 28.1\%, surpassing the best baseline's performance of 20.5\%.

Comments:	22 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2306.10322 [cs.CV]
	(or arXiv:2306.10322v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.10322

Submission history

From: Xiwen Liang [view email]
[v1] Sat, 17 Jun 2023 11:44:04 UTC (5,371 KB)
[v2] Tue, 26 Sep 2023 05:18:49 UTC (7,275 KB)
[v3] Thu, 14 Mar 2024 14:33:51 UTC (9,351 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators