Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query

Cai, Guanyu; Zhang, Jun; Jiang, Xinyang; Gong, Yifei; He, Lianghua; Yu, Fufu; Peng, Pai; Guo, Xiaowei; Huang, Feiyue; Sun, Xing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.01654 (cs)

[Submitted on 2 Mar 2021 (v1), last revised 11 Aug 2021 (this version, v2)]

Title:Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query

Authors:Guanyu Cai, Jun Zhang, Xinyang Jiang, Yifei Gong, Lianghua He, Fufu Yu, Pai Peng, Xiaowei Guo, Feiyue Huang, Xing Sun

View PDF

Abstract:Text-based image retrieval has seen considerable progress in recent years. However, the performance of existing methods suffers in real life since the user is likely to provide an incomplete description of an image, which often leads to results filled with false positives that fit the incomplete description. In this work, we introduce the partial-query problem and extensively analyze its influence on text-based image retrieval. Previous interactive methods tackle the problem by passively receiving users' feedback to supplement the incomplete query iteratively, which is time-consuming and requires heavy user effort. Instead, we propose a novel retrieval framework that conducts the interactive process in an Ask-and-Confirm fashion, where AI actively searches for discriminative details missing in the current query, and users only need to confirm AI's proposal. Specifically, we propose an object-based interaction to make the interactive retrieval more user-friendly and present a reinforcement-learning-based policy to search for discriminative objects. Furthermore, since fully-supervised training is often infeasible due to the difficulty of obtaining human-machine dialog data, we present a weakly-supervised training strategy that needs no human-annotated dialogs other than a text-image dataset. Experiments show that our framework significantly improves the performance of text-based image retrieval. Code is avaiable at this https URL.

Comments:	Accepted by ICCV2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.01654 [cs.CV]
	(or arXiv:2103.01654v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.01654

Submission history

From: Guanyu Cai [view email]
[v1] Tue, 2 Mar 2021 11:27:05 UTC (3,325 KB)
[v2] Wed, 11 Aug 2021 07:35:53 UTC (3,379 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators