EscapeBench: Pushing Language Models to Think Outside the Box

Qian, Cheng; Han, Peixuan; Luo, Qinyu; He, Bingxiang; Chen, Xiusi; Zhang, Yuji; Du, Hongyi; Yao, Jiarui; Yang, Xiaocheng; Zhang, Denghui; Li, Yunzhu; Ji, Heng

Computer Science > Computation and Language

arXiv:2412.13549 (cs)

[Submitted on 18 Dec 2024]

Title:EscapeBench: Pushing Language Models to Think Outside the Box

Authors:Cheng Qian, Peixuan Han, Qinyu Luo, Bingxiang He, Xiusi Chen, Yuji Zhang, Hongyi Du, Jiarui Yao, Xiaocheng Yang, Denghui Zhang, Yunzhu Li, Heng Ji

View PDF HTML (experimental)

Abstract:Language model agents excel in long-session planning and reasoning, but existing benchmarks primarily focus on goal-oriented tasks with explicit objectives, neglecting creative adaptation in unfamiliar environments. To address this, we introduce EscapeBench, a benchmark suite of room escape game environments designed to challenge agents with creative reasoning, unconventional tool use, and iterative problem-solving to uncover implicit goals. Our results show that current LM models, despite employing working memory and Chain-of-Thought reasoning, achieve only 15% average progress without hints, highlighting their limitations in creativity. To bridge this gap, we propose EscapeAgent, a framework designed to enhance creative reasoning through Foresight (innovative tool use) and Reflection (identifying unsolved tasks). Experiments show that EscapeAgent can execute action chains over 1,000 steps while maintaining logical coherence. It navigates and completes games with up to 40% fewer steps and hints, performs robustly across varying difficulty levels, and achieves higher action success rates with more efficient and innovative puzzle-solving strategies. All the data and codes are released.

Comments:	23 pages, 15 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2412.13549 [cs.CL]
	(or arXiv:2412.13549v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.13549

Submission history

From: Cheng Qian [view email]
[v1] Wed, 18 Dec 2024 06:50:39 UTC (20,639 KB)

Computer Science > Computation and Language

Title:EscapeBench: Pushing Language Models to Think Outside the Box

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:EscapeBench: Pushing Language Models to Think Outside the Box

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators