Enhancing CLIP Conceptual Embedding through Knowledge Distillation

Kao, Kuei-Chun

Computer Science > Artificial Intelligence

arXiv:2412.03513 (cs)

[Submitted on 4 Dec 2024 (v1), last revised 7 Dec 2024 (this version, v2)]

Title:Enhancing CLIP Conceptual Embedding through Knowledge Distillation

Authors:Kuei-Chun Kao

View PDF HTML (experimental)

Abstract:Recently, CLIP has become an important model for aligning images and text in multi-modal contexts. However, researchers have identified limitations in the ability of CLIP's text and image encoders to extract detailed knowledge from pairs of captions and images. In response, this paper presents Knowledge-CLIP, an innovative approach designed to improve CLIP's performance by integrating a new knowledge distillation (KD) method based on Llama 2. Our approach focuses on three key objectives: Text Embedding Distillation, Concept Learning, and Contrastive Learning. First, Text Embedding Distillation involves training the Knowledge-CLIP text encoder to mirror the teacher model, Llama 2. Next, Concept Learning assigns a soft concept label to each caption-image pair by employing offline K-means clustering on text data from Llama 2, enabling Knowledge-CLIP to learn from these soft concept labels. Lastly, Contrastive Learning aligns the text and image embeddings. Our experimental findings show that the proposed model improves the performance of both text and image encoders.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2412.03513 [cs.AI]
	(or arXiv:2412.03513v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2412.03513

Submission history

From: Kuei-Chun Kao [view email]
[v1] Wed, 4 Dec 2024 17:56:49 UTC (19,278 KB)
[v2] Sat, 7 Dec 2024 13:01:13 UTC (19,278 KB)

Computer Science > Artificial Intelligence

Title:Enhancing CLIP Conceptual Embedding through Knowledge Distillation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Enhancing CLIP Conceptual Embedding through Knowledge Distillation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators