Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval

Liu, Haoyu; Song, Yaoxian; Wang, Xuwu; Xiangru, Zhu; Li, Zhixu; Song, Wei; Li, Tiefeng

Computer Science > Information Retrieval

arXiv:2403.13317 (cs)

[Submitted on 20 Mar 2024 (v1), last revised 1 Apr 2024 (this version, v2)]

Title:Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval

Authors:Haoyu Liu, Yaoxian Song, Xuwu Wang, Zhu Xiangru, Zhixu Li, Wei Song, Tiefeng Li

View PDF HTML (experimental)

Abstract:With the explosive growth of multi-modal information on the Internet, unimodal search cannot satisfy the requirement of Internet applications. Text-image retrieval research is needed to realize high-quality and efficient retrieval between different modalities. Existing text-image retrieval research is mostly based on general vision-language datasets (e.g. MS-COCO, Flickr30K), in which the query utterance is rigid and unnatural (i.e. verbosity and formality). To overcome the shortcoming, we construct a new Compact and Fragmented Query challenge dataset (named Flickr30K-CFQ) to model text-image retrieval task considering multiple query content and style, including compact and fine-grained entity-relation corpus. We propose a novel query-enhanced text-image retrieval method using prompt engineering based on LLM. Experiments show that our proposed Flickr30-CFQ reveals the insufficiency of existing vision-language datasets in realistic text-image tasks. Our LLM-based Query-enhanced method applied on different existing text-image retrieval models improves query understanding performance both on public dataset and our challenge set Flickr30-CFQ with over 0.9% and 2.4% respectively. Our project can be available anonymously in this https URL.

Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2403.13317 [cs.IR]
	(or arXiv:2403.13317v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2403.13317

Submission history

From: Haoyu Liu [view email]
[v1] Wed, 20 Mar 2024 05:38:50 UTC (22,779 KB)
[v2] Mon, 1 Apr 2024 05:24:43 UTC (22,475 KB)

Computer Science > Information Retrieval

Title:Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators