Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation

Yang, Jheng-Hong; Lin, Jimmy

Computer Science > Information Retrieval

arXiv:2408.01363 (cs)

[Submitted on 2 Aug 2024]

Title:Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation

Authors:Jheng-Hong Yang, Jimmy Lin

View PDF HTML (experimental)

Abstract:Vision--Language Models (VLMs) have demonstrated success across diverse applications, yet their potential to assist in relevance judgments remains uncertain. This paper assesses the relevance estimation capabilities of VLMs, including CLIP, LLaVA, and GPT-4V, within a large-scale \textit{ad hoc} retrieval task tailored for multimedia content creation in a zero-shot fashion. Preliminary experiments reveal the following: (1) Both LLaVA and GPT-4V, encompassing open-source and closed-source visual-instruction-tuned Large Language Models (LLMs), achieve notable Kendall's $\tau \sim 0.4$ when compared to human relevance judgments, surpassing the CLIPScore metric. (2) While CLIPScore is strongly preferred, LLMs are less biased towards CLIP-based retrieval systems. (3) GPT-4V's score distribution aligns more closely with human judgments than other models, achieving a Cohen's $\kappa$ value of around 0.08, which outperforms CLIPScore at approximately -0.096. These findings underscore the potential of LLM-powered VLMs in enhancing relevance judgments.

Comments:	Accepted by ACM SIGIR 2024 LLM4Eval Workshop: this https URL
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2408.01363 [cs.IR]
	(or arXiv:2408.01363v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2408.01363

Submission history

From: Jheng-Hong Yang [view email]
[v1] Fri, 2 Aug 2024 16:15:25 UTC (100 KB)

Computer Science > Information Retrieval

Title:Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators