Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

Daynauth, Roland; Clarke, Christopher; Flautner, Krisztian; Tang, Lingjia; Mars, Jason

Computer Science > Computation and Language

arXiv:2411.14483 (cs)

[Submitted on 19 Nov 2024 (v1), last revised 17 Feb 2025 (this version, v2)]

Title:Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

Authors:Roland Daynauth, Christopher Clarke, Krisztian Flautner, Lingjia Tang, Jason Mars

View PDF HTML (experimental)

Abstract:Deciding which large language model (LLM) to use is a complex challenge. Pairwise ranking has emerged as a new method for evaluating human preferences for LLMs. This approach entails humans evaluating pairs of model outputs based on a predefined criterion. By collecting these comparisons, a ranking can be constructed using methods such as Elo. However, applying these algorithms as constructed in the context of LLM evaluation introduces several challenges. In this paper, we explore the effectiveness of ranking systems for head-to-head comparisons of LLMs. We formally define a set of fundamental principles for effective ranking and conduct a series of extensive evaluations on the robustness of several ranking algorithms in the context of LLMs. Our analysis uncovers key insights into the factors that affect ranking accuracy and efficiency, offering guidelines for selecting the most appropriate methods based on specific evaluation contexts and resource constraints.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.14483 [cs.CL]
	(or arXiv:2411.14483v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.14483

Submission history

From: Roland Daynauth [view email]
[v1] Tue, 19 Nov 2024 20:16:26 UTC (721 KB)
[v2] Mon, 17 Feb 2025 16:21:10 UTC (737 KB)

Computer Science > Computation and Language

Title:Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators