From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Li, Dawei; Jiang, Bohan; Huang, Liangjie; Beigi, Alimohammad; Zhao, Chengshuai; Tan, Zhen; Bhattacharjee, Amrita; Jiang, Yuxuan; Chen, Canyu; Wu, Tianhao; Shu, Kai; Cheng, Lu; Liu, Huan

Computer Science > Artificial Intelligence

arXiv:2411.16594 (cs)

[Submitted on 25 Nov 2024]

Title:From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Authors:Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu

View PDF HTML (experimental)

Abstract:Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are leveraged to perform scoring, ranking, or selection across various tasks and applications. This paper provides a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to advance this emerging field. We begin by giving detailed definitions from both input and output perspectives. Then we introduce a comprehensive taxonomy to explore LLM-as-a-judge from three dimensions: what to judge, how to judge and where to judge. Finally, we compile benchmarks for evaluating LLM-as-a-judge and highlight key challenges and promising directions, aiming to provide valuable insights and inspire future research in this promising research area. Paper list and more resources about LLM-as-a-judge can be found at \url{this https URL} and \url{this https URL}.

Comments:	32 pages, 5 figures
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2411.16594 [cs.AI]
	(or arXiv:2411.16594v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2411.16594

Submission history

From: Dawei Li [view email]
[v1] Mon, 25 Nov 2024 17:28:44 UTC (1,298 KB)

Computer Science > Artificial Intelligence

Title:From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators