MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments

Yin, Cai; Zhouhong, Gu; Zhaohan, Du; Zheyu, Ye; Shaosheng, Cao; Yiqian, Xu; Hongwei, Feng; Ping, Chen

Computer Science > Computation and Language

arXiv:2501.01652 (cs)

[Submitted on 3 Jan 2025]

Title:MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments

Authors:Cai Yin, Gu Zhouhong, Du Zhaohan, Ye Zheyu, Cao Shaosheng, Xu Yiqian, Feng Hongwei, Chen Ping

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have shown remarkable capabilities in environmental perception, reasoning-based decision-making, and simulating complex human behaviors, particularly in interactive role-playing contexts. This paper introduces the Multiverse Interactive Role-play Ability General Evaluation (MIRAGE), a comprehensive framework designed to assess LLMs' proficiency in portraying advanced human behaviors through murder mystery games. MIRAGE features eight intricately crafted scripts encompassing diverse themes and styles, providing a rich simulation. To evaluate LLMs' performance, MIRAGE employs four distinct methods: the Trust Inclination Index (TII) to measure dynamics of trust and suspicion, the Clue Investigation Capability (CIC) to measure LLMs' capability of conducting information, the Interactivity Capability Index (ICI) to assess role-playing capabilities and the Script Compliance Index (SCI) to assess LLMs' capability of understanding and following instructions. Our experiments indicate that even popular models like GPT-4 face significant challenges in navigating the complexities presented by the MIRAGE. The datasets and simulation codes are available in \href{this https URL}{github}.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2501.01652 [cs.CL]
	(or arXiv:2501.01652v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.01652

Submission history

From: Yin Cai [view email]
[v1] Fri, 3 Jan 2025 06:07:48 UTC (1,170 KB)

Computer Science > Computation and Language

Title:MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators