OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following

Shi, Haochen; Sun, Zhiyuan; Yuan, Xingdi; Côté, Marc-Alexandre; Liu, Bang

Abstract:Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions. Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in embodied learning tasks, including EIF. Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components-ranging from visual perception to action execution-on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. Through extensive evaluations, we provide a deep analysis of how each component influences EIF task performance. Furthermore, we innovate within this space by deploying a multi-agent dialogue strategy on a TextWorld counterpart, further enhancing task performance. Our findings reveal that LLM-centric design markedly improves EIF outcomes, identify visual perception and low-level action execution as critical bottlenecks, and demonstrate that augmenting LLMs with a multi-agent framework further elevates performance.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.03017 [cs.AI]
	(or arXiv:2403.03017v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2403.03017

Computer Science > Artificial Intelligence

Title:OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators