AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

Gong, Ziyu; Mai, Chengcheng; Huang, Yihua

Computer Science > Multimedia

arXiv:2405.10029 (cs)

[Submitted on 16 May 2024 (v1), last revised 17 May 2024 (this version, v2)]

Title:AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

Authors:Ziyu Gong, Chengcheng Mai, Yihua Huang

View PDF HTML (experimental)

Abstract:The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, while the fine-grained differences across modalities have rarely been studied before, making it difficult to solve the information asymmetry problem. In this paper, we propose a novel asymmetry-sensitive contrastive learning method. By generating corresponding positive and negative samples for different asymmetry types, our method can simultaneously ensure fine-grained semantic differentiation and unified semantic representation between multi-modalities. Additionally, a hierarchical cross-modal fusion method is proposed, which integrates global and local-level features through a multimodal attention mechanism to achieve concept alignment. Extensive experiments performed on MSCOCO and Flickr30K, demonstrate the effectiveness and superiority of our proposed method.

Comments:	This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024
Subjects:	Multimedia (cs.MM)
Cite as:	arXiv:2405.10029 [cs.MM]
	(or arXiv:2405.10029v2 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2405.10029

Submission history

From: Ziyu Gong [view email]
[v1] Thu, 16 May 2024 12:11:59 UTC (1,201 KB)
[v2] Fri, 17 May 2024 05:51:24 UTC (1,201 KB)

Computer Science > Multimedia

Title:AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators