Word4Per: Zero-shot Composed Person Retrieval

Liu, Delong; Li, Haiwen; Zhao, Zhicheng; Su, Fei; Dong, Yuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.16515 (cs)

[Submitted on 25 Nov 2023 (v1), last revised 25 Mar 2024 (this version, v2)]

Title:Word4Per: Zero-shot Composed Person Retrieval

Authors:Delong Liu, Haiwen Li, Zhicheng Zhao, Fei Su, Yuan Dong

View PDF HTML (experimental)

Abstract:Searching for specific person has great social benefits and security value, and it often involves a combination of visual and textual information. Conventional person retrieval methods, whether image-based or text-based, usually fall short in effectively harnessing both types of information, leading to the loss of accuracy. In this paper, a whole new task called Composed Person Retrieval (CPR) is proposed to jointly utilize both image and text information for target person retrieval. However, the supervised CPR requires very costly manual annotation dataset, while there are currently no available resources. To mitigate this issue, we firstly introduce the Zero-shot Composed Person Retrieval (ZS-CPR), which leverages existing domain-related data to resolve the CPR problem without expensive annotations. Secondly, to learn ZS-CPR model, we propose a two-stage learning framework, Word4Per, where a lightweight Textual Inversion Network (TINet) and a text-based person retrieval model based on fine-tuned Contrastive Language-Image Pre-training (CLIP) network are learned without utilizing any CPR data. Thirdly, a finely annotated Image-Text Composed Person Retrieval (ITCPR) dataset is built as the benchmark to assess the performance of the proposed Word4Per framework. Extensive experiments under both Rank-1 and mAP demonstrate the effectiveness of Word4Per for the ZS-CPR task, surpassing the comparative methods by over 10\%. The code and ITCPR dataset will be publicly available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2311.16515 [cs.CV]
	(or arXiv:2311.16515v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.16515

Submission history

From: Delong Liu [view email]
[v1] Sat, 25 Nov 2023 14:24:49 UTC (1,520 KB)
[v2] Mon, 25 Mar 2024 12:01:59 UTC (1,477 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Word4Per: Zero-shot Composed Person Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Word4Per: Zero-shot Composed Person Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators