Keep It Light! Simplifying Image Clustering Via Text-Free Adapters

Li, Yicen; Borde, Haitz Sáez de Ocáriz; Kratsios, Anastasis; McNicholas, Paul D.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.04226 (cs)

[Submitted on 6 Feb 2025]

Title:Keep It Light! Simplifying Image Clustering Via Text-Free Adapters

Authors:Yicen Li, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios, Paul D. McNicholas

View PDF HTML (experimental)

Abstract:Many competitive clustering pipelines have a multi-modal design, leveraging large language models (LLMs) or other text encoders, and text-image pairs, which are often unavailable in real-world downstream applications. Additionally, such frameworks are generally complicated to train and require substantial computational resources, making widespread adoption challenging. In this work, we show that in deep clustering, competitive performance with more complex state-of-the-art methods can be achieved using a text-free and highly simplified training pipeline. In particular, our approach, Simple Clustering via Pre-trained models (SCP), trains only a small cluster head while leveraging pre-trained vision model feature representations and positive data pairs. Experiments on benchmark datasets including CIFAR-10, CIFAR-20, CIFAR-100, STL-10, ImageNet-10, and ImageNet-Dogs, demonstrate that SCP achieves highly competitive performance. Furthermore, we provide a theoretical result explaining why, at least under ideal conditions, additional text-based embeddings may not be necessary to achieve strong clustering performance in vision.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Computation (stat.CO); Machine Learning (stat.ML)
Cite as:	arXiv:2502.04226 [cs.CV]
	(or arXiv:2502.04226v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.04226

Submission history

From: Yicen Li [view email]
[v1] Thu, 6 Feb 2025 17:12:07 UTC (31,835 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Keep It Light! Simplifying Image Clustering Via Text-Free Adapters

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Keep It Light! Simplifying Image Clustering Via Text-Free Adapters

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators