Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Chiang, Wei-Lin; Zheng, Lianmin; Sheng, Ying; Angelopoulos, Anastasios Nikolas; Li, Tianle; Li, Dacheng; Zhang, Hao; Zhu, Banghua; Jordan, Michael; Gonzalez, Joseph E.; Stoica, Ion

Computer Science > Artificial Intelligence

arXiv:2403.04132 (cs)

[Submitted on 7 Mar 2024]

Title:Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Authors:Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing. The platform has been operational for several months, amassing over 240K votes. This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using for efficient and accurate evaluation and ranking of models. We confirm that the crowdsourced questions are sufficiently diverse and discriminating and that the crowdsourced human votes are in good agreement with those of expert raters. These analyses collectively establish a robust foundation for the credibility of Chatbot Arena. Because of its unique value and openness, Chatbot Arena has emerged as one of the most referenced LLM leaderboards, widely cited by leading LLM developers and companies. Our demo is publicly available at \url{this https URL}.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2403.04132 [cs.AI]
	(or arXiv:2403.04132v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2403.04132

Submission history

From: Wei-Lin Chiang [view email]
[v1] Thu, 7 Mar 2024 01:22:38 UTC (1,652 KB)

Computer Science > Artificial Intelligence

Title:Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators