Dual-Level Cross-Modal Contrastive Clustering

Zhang, Haixin; Li, Yongjun; Huang, Dong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.04561 (cs)

This paper has been withdrawn by Haixin Zhang

[Submitted on 6 Sep 2024 (v1), last revised 20 Sep 2024 (this version, v2)]

Title:Dual-Level Cross-Modal Contrastive Clustering

Authors:Haixin Zhang, Yongjun Li, Dong Huang

No PDF available, click to view other formats

Abstract:Image clustering, which involves grouping images into different clusters without labels, is a key task in unsupervised learning. Although previous deep clustering methods have achieved remarkable results, they only explore the intrinsic information of the image itself but overlook external supervision knowledge to improve the semantic understanding of images. Recently, visual-language pre-trained model on large-scale datasets have been used in various downstream tasks and have achieved great results. However, there is a gap between visual representation learning and textual semantic learning, and how to properly utilize the representation of two different modalities for clustering is still a big challenge. To tackle the challenges, we propose a novel image clustering framwork, named Dual-level Cross-Modal Contrastive Clustering (DXMC). Firstly, external textual information is introduced for constructing a semantic space which is adopted to generate image-text pairs. Secondly, the image-text pairs are respectively sent to pre-trained image and text encoder to obtain image and text embeddings which subsquently are fed into four well-designed networks. Thirdly, dual-level cross-modal contrastive learning is conducted between discriminative representations of different modalities and distinct level. Extensive experimental results on five benchmark datasets demonstrate the superiority of our proposed method.

Comments:	We have found that our paper has many imperfections and incorrect formulas and derivations, and we insist on retracting the manuscript in order to avoid misleading readers.
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.04561 [cs.CV]
	(or arXiv:2409.04561v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.04561

Submission history

From: Haixin Zhang [view email]
[v1] Fri, 6 Sep 2024 18:49:45 UTC (25,945 KB)
[v2] Fri, 20 Sep 2024 11:07:53 UTC (1 KB) (withdrawn)

Computer Science > Computer Vision and Pattern Recognition

Title:Dual-Level Cross-Modal Contrastive Clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Dual-Level Cross-Modal Contrastive Clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators