Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

Jiang, Kaixuan; Liu, Yang; Chen, Weixing; Luo, Jingzhou; Chen, Ziliang; Pan, Ling; Li, Guanbin; Lin, Liang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.11117 (cs)

[Submitted on 14 Mar 2025 (v1), last revised 19 Mar 2025 (this version, v2)]

Title:Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

Authors:Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin

View PDF HTML (experimental)

Abstract:Embodied Question Answering (EQA) is a challenging task in embodied intelligence that requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions. However, current EQA approaches suffer from critical limitations in exploration efficiency, dataset design, and evaluation metrics. Moreover, existing datasets often introduce biases or prior knowledge, leading to disembodied reasoning, while frontier-based exploration strategies struggle in cluttered environments and fail to ensure fine-grained exploration of task-relevant areas. To address these challenges, we construct the EXPloration-awaRe Embodied queStion anSwering Benchmark (EXPRESS-Bench), the largest dataset designed specifically to evaluate both exploration and reasoning capabilities. EXPRESS-Bench consists of 777 exploration trajectories and 2,044 question-trajectory pairs. To improve exploration efficiency, we propose Fine-EQA, a hybrid exploration model that integrates frontier-based and goal-oriented navigation to guide agents toward task-relevant regions more effectively. Additionally, we introduce a novel evaluation metric, Exploration-Answer Consistency (EAC), which ensures faithful assessment by measuring the alignment between answer grounding and exploration reliability. Extensive experimental comparisons with state-of-the-art EQA models demonstrate the effectiveness of our EXPRESS-Bench in advancing embodied exploration and question reasoning.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.11117 [cs.CV]
	(or arXiv:2503.11117v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.11117

Submission history

From: Kaixuan Jiang [view email]
[v1] Fri, 14 Mar 2025 06:29:47 UTC (4,819 KB)
[v2] Wed, 19 Mar 2025 06:56:19 UTC (4,819 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators