Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Balog, Krisztian; Metzler, Donald; Qin, Zhen

Computer Science > Information Retrieval

arXiv:2503.19092 (cs)

[Submitted on 24 Mar 2025]

Title:Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Authors:Krisztian Balog, Donald Metzler, Zhen Qin

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation. This widespread adoption necessitates a critical examination of potential biases arising from the interplay between these LLM-based components. This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges. We provide the first empirical evidence of LLM judges exhibiting significant bias towards LLM-based rankers. Furthermore, we observe limitations in LLM judges' ability to discern subtle system performance differences. Contrary to some previous findings, our preliminary study does not find evidence of bias against AI-generated content. These results highlight the need for a more holistic view of the LLM-driven information ecosystem. To this end, we offer initial guidelines and a research agenda to ensure the reliable use of LLMs in IR evaluation.

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2503.19092 [cs.IR]
	(or arXiv:2503.19092v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2503.19092

Submission history

From: Krisztian Balog [view email]
[v1] Mon, 24 Mar 2025 19:24:40 UTC (157 KB)

Computer Science > Information Retrieval

Title:Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators