TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

Lyu, Shuai; Tian, Zijing; Ou, Zhonghong; Zhu, Yifan; Zhang, Xiao; Ha, Qiankun; Luo, Haoran; Song, Meina

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.10935 (cs)

[Submitted on 19 Jan 2025]

Title:TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

Authors:Shuai Lyu, Zijing Tian, Zhonghong Ou, Yifan Zhu, Xiao Zhang, Qiankun Ha, Haoran Luo, Meina Song

View PDF HTML (experimental)

Abstract:Cross-modal retrieval maps data under different modality via semantic relevance. Existing approaches implicitly assume that data pairs are well-aligned and ignore the widely existing annotation noise, i.e., noisy correspondence (NC). Consequently, it inevitably causes performance degradation. Despite attempts that employ the co-teaching paradigm with identical architectures to provide distinct data perspectives, the differences between these architectures are primarily stemmed from random initialization. Thus, the model becomes increasingly homogeneous along with the training process. Consequently, the additional information brought by this paradigm is severely limited. In order to resolve this problem, we introduce a Tripartite learning with Semantic Variation Consistency (TSVC) for robust image-text retrieval. We design a tripartite cooperative learning mechanism comprising a Coordinator, a Master, and an Assistant model. The Coordinator distributes data, and the Assistant model supports the Master model's noisy label prediction with diverse data. Moreover, we introduce a soft label estimation method based on mutual information variation, which quantifies the noise in new samples and assigns corresponding soft labels. We also present a new loss function to enhance robustness and optimize training effectiveness. Extensive experiments on three widely used datasets demonstrate that, even at increasing noise ratios, TSVC exhibits significant advantages in retrieval accuracy and maintains stable training performance.

Comments:	This paper has been accepted to the Main Track of AAAI 2025. It contains 9 pages, 7 figures, and is relevant to the areas of cross-modal retrieval and machine learning. The work presents a novel approach in robust image-text retrieval using a tripartite learning framework
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.10935 [cs.CV]
	(or arXiv:2501.10935v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.10935

Submission history

From: Shuai Lyu [view email]
[v1] Sun, 19 Jan 2025 04:05:08 UTC (1,447 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators