AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

Lin, Jiaju; Zhao, Haoran; Zhang, Aochi; Wu, Yiting; Ping, Huqiuyue; Chen, Qin

Computer Science > Artificial Intelligence

arXiv:2308.04026 (cs)

[Submitted on 8 Aug 2023]

Title:AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

Authors:Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, Qin Chen

View PDF

Abstract:With ChatGPT-like large language models (LLM) prevailing in the community, how to evaluate the ability of LLMs is an open question. Existing evaluation methods suffer from following shortcomings: (1) constrained evaluation abilities, (2) vulnerable benchmarks, (3) unobjective metrics. We suggest that task-based evaluation, where LLM agents complete tasks in a simulated environment, is a one-for-all solution to solve above problems. We present AgentSims, an easy-to-use infrastructure for researchers from all disciplines to test the specific capacities they are interested in. Researchers can build their evaluation tasks by adding agents and buildings on an interactive GUI or deploy and test new support mechanisms, i.e. memory, planning and tool-use systems, by a few lines of codes. Our demo is available at this https URL .

Comments:	submit to EMNLP2023 demo track
Subjects:	Artificial Intelligence (cs.AI)
MSC classes:	14J60 (Primary) 14F05, 14J26 (Secondary) MSC-class: 14J60 (Primary) 14F05, 14J26 (Secondary) 68T42
Cite as:	arXiv:2308.04026 [cs.AI]
	(or arXiv:2308.04026v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2308.04026

Submission history

From: Jiaju Lin [view email]
[v1] Tue, 8 Aug 2023 03:59:28 UTC (1,351 KB)

Computer Science > Artificial Intelligence

Title:AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators