CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

Long, Zijun; Ge, Xuri; Mccreadie, Richard; Jose, Joemon

Computer Science > Information Retrieval

arXiv:2402.15276 (cs)

[Submitted on 23 Feb 2024 (v1), last revised 2 Apr 2024 (this version, v3)]

Title:CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

Authors:Zijun Long, Xuri Ge, Richard Mccreadie, Joemon Jose

View PDF HTML (experimental)

Abstract:Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate state-of-the-art performance, they exhibit limitations in handling large-scale, diverse, and ambiguous real-world needs of retrieval, due to the computation cost and the injective embeddings they produce. This paper presents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework, designed for fast and effective large-scale long-text to image retrieval. The first stage, Entity-based Ranking (ER), adapts to long-text query ambiguity by employing a multiple-queries-to-multiple-targets paradigm, facilitating candidate filtering for the next stage. The second stage, Summary-based Re-ranking (SR), refines these rankings using summarized queries. We also propose a specialized Decoupling-BEiT-3 encoder, optimized for handling ambiguous user needs and both stages, which also enhances computational efficiency through vector-based similarity inference. Evaluation on the AToMiC dataset reveals that CFIR surpasses existing MLLMs by up to 11.06% in Recall@1000, while reducing training and retrieval times by 68.75% and 99.79%, respectively. We will release our code to facilitate future research at this https URL.

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.15276 [cs.IR]
	(or arXiv:2402.15276v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2402.15276

Submission history

From: Zijun Long [view email]
[v1] Fri, 23 Feb 2024 11:47:16 UTC (3,555 KB)
[v2] Wed, 28 Feb 2024 16:49:13 UTC (3,555 KB)
[v3] Tue, 2 Apr 2024 20:54:46 UTC (1,791 KB)

Computer Science > Information Retrieval

Title:CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators