I0T: Embedding Standardization Method Towards Zero Modality Gap

An, Na Min; Kim, Eunki; Thorne, James; Shim, Hyunjung

Computer Science > Machine Learning

arXiv:2412.14384 (cs)

[Submitted on 18 Dec 2024]

Title:I0T: Embedding Standardization Method Towards Zero Modality Gap

Authors:Na Min An, Eunki Kim, James Thorne, Hyunjung Shim

View PDF HTML (experimental)

Abstract:Contrastive Language-Image Pretraining (CLIP) enables zero-shot inference in downstream tasks such as image-text retrieval and classification. However, recent works extending CLIP suffer from the issue of modality gap, which arises when the image and text embeddings are projected to disparate manifolds, deviating from the intended objective of image-text contrastive learning. We discover that this phenomenon is linked to the modality-specific characteristic that each image/text encoder independently possesses and propose two methods to address the modality gap: (1) a post-hoc embedding standardization method, $\text{I0T}_{\text{post}}$ that reduces the modality gap approximately to zero and (2) a trainable method, $\text{I0T}_{\text{async}}$, to alleviate the modality gap problem by adding two normalization layers for each encoder. Our I0T framework can significantly reduce the modality gap while preserving the original embedding representations of trained models with their locked parameters. In practice, $\text{I0T}_{\text{post}}$ can serve as an alternative explainable automatic evaluation metric of widely used CLIPScore (CLIP-S).

Comments:	16 figures, 8 figures, 7 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.14384 [cs.LG]
	(or arXiv:2412.14384v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.14384

Submission history

From: Na Min An [view email]
[v1] Wed, 18 Dec 2024 22:35:01 UTC (3,273 KB)

Computer Science > Machine Learning

Title:I0T: Embedding Standardization Method Towards Zero Modality Gap

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:I0T: Embedding Standardization Method Towards Zero Modality Gap

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators