NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Wang, Zihan; Zhu, Yaohui; Lee, Gim Hee; Fan, Yachun

Computer Science > Artificial Intelligence

arXiv:2502.11142 (cs)

[Submitted on 16 Feb 2025]

Title:NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Authors:Zihan Wang, Yaohui Zhu, Gim Hee Lee, Yachun Fan

View PDF HTML (experimental)

Abstract:Vision-and-Language Navigation (VLN) is an essential skill for embodied agents, allowing them to navigate in 3D environments following natural language instructions. High-performance navigation models require a large amount of training data, the high cost of manually annotating data has seriously hindered this field. Therefore, some previous methods translate trajectory videos into step-by-step instructions for expanding data, but such instructions do not match well with users' communication styles that briefly describe destinations or state specific needs. Moreover, local navigation trajectories overlook global context and high-level task planning. To address these issues, we propose NavRAG, a retrieval-augmented generation (RAG) framework that generates user demand instructions for VLN. NavRAG leverages LLM to build a hierarchical scene description tree for 3D scene understanding from global layout to local details, then simulates various user roles with specific demands to retrieve from the scene tree, generating diverse instructions with LLM. We annotate over 2 million navigation instructions across 861 scenes and evaluate the data quality and navigation performance of trained models.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.11142 [cs.AI]
	(or arXiv:2502.11142v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2502.11142

Submission history

From: Zihan Wang [view email]
[v1] Sun, 16 Feb 2025 14:17:36 UTC (2,635 KB)

Computer Science > Artificial Intelligence

Title:NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators