Do LLMs Agree on the Creativity Evaluation of Alternative Uses?

Rabeyah, Abdullah Al; Góes, Fabrício; Volpe, Marco; Medeiros, Talles

Computer Science > Artificial Intelligence

arXiv:2411.15560 (cs)

[Submitted on 23 Nov 2024 (v1), last revised 26 Nov 2024 (this version, v2)]

Title:Do LLMs Agree on the Creativity Evaluation of Alternative Uses?

Authors:Abdullah Al Rabeyah, Fabrício Góes, Marco Volpe, Talles Medeiros

View PDF HTML (experimental)

Abstract:This paper investigates whether large language models (LLMs) show agreement in assessing creativity in responses to the Alternative Uses Test (AUT). While LLMs are increasingly used to evaluate creative content, previous studies have primarily focused on a single model assessing responses generated by the same model or humans. This paper explores whether LLMs can impartially and accurately evaluate creativity in outputs generated by both themselves and other models. Using an oracle benchmark set of AUT responses, categorized by creativity level (common, creative, and highly creative), we experiment with four state-of-the-art LLMs evaluating these outputs. We test both scoring and ranking methods and employ two evaluation settings (comprehensive and segmented) to examine if LLMs agree on the creativity evaluation of alternative uses. Results reveal high inter-model agreement, with Spearman correlations averaging above 0.7 across models and reaching over 0.77 with respect to the oracle, indicating a high level of agreement and validating the reliability of LLMs in creativity assessment of alternative uses. Notably, models do not favour their own responses, instead they provide similar creativity assessment scores or rankings for alternative uses generated by other models. These findings suggest that LLMs exhibit impartiality and high alignment in creativity evaluation, offering promising implications for their use in automated creativity assessment.

Comments:	19 pages, 7 figures, 15 tables
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2411.15560 [cs.AI]
	(or arXiv:2411.15560v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2411.15560

Submission history

From: Fabrício Góes [view email]
[v1] Sat, 23 Nov 2024 13:34:50 UTC (1,355 KB)
[v2] Tue, 26 Nov 2024 09:25:22 UTC (1,355 KB)

Computer Science > Artificial Intelligence

Title:Do LLMs Agree on the Creativity Evaluation of Alternative Uses?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Do LLMs Agree on the Creativity Evaluation of Alternative Uses?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators