CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

Lee, Jungdae; Miyanishi, Taiki; Kurita, Shuhei; Sakamoto, Koya; Azuma, Daichi; Matsuo, Yutaka; Inoue, Nakamasa

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.14240v1 (cs)

[Submitted on 20 Jun 2024 (this version), latest version 5 Oct 2024 (v2)]

Title:CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

Authors:Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue

View PDF

Abstract:Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues. While substantial progress has been made in understanding these interactive modalities in ground-level navigation, aerial navigation remains largely underexplored. This is primarily due to the scarcity of resources suitable for real-world, city-scale aerial navigation studies. To bridge this gap, we introduce CityNav, a new dataset for language-goal aerial navigation using a 3D point cloud representation from real-world cities. CityNav includes 32,637 natural language descriptions paired with human demonstration trajectories, collected from participants via a new web-based 3D simulator developed for this research. Each description specifies a navigation goal, leveraging the names and locations of landmarks within real-world cities. We also provide baseline models of navigation agents that incorporate an internal 2D spatial map representing landmarks referenced in the descriptions. We benchmark the latest aerial navigation baselines and our proposed model on the CityNav dataset. The results using this dataset reveal the following key findings: (i) Our aerial agent models trained on human demonstration trajectories outperform those trained on shortest path trajectories, highlighting the importance of human-driven navigation strategies; (ii) The integration of a 2D spatial map significantly enhances navigation efficiency at city scale. Our dataset and code are available at this https URL

Comments:	The first two authors are equally contributed
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.14240 [cs.CV]
	(or arXiv:2406.14240v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.14240

Submission history

From: Taiki Miyanishi [view email]
[v1] Thu, 20 Jun 2024 12:08:27 UTC (23,381 KB)
[v2] Sat, 5 Oct 2024 16:53:09 UTC (33,816 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators