Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Shiran, Guy; Weinshall, Daphna

Computer Science > Computer Vision and Pattern Recognition

arXiv:1912.02678 (cs)

[Submitted on 5 Dec 2019 (v1), last revised 15 Dec 2020 (this version, v3)]

Title:Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Authors:Guy Shiran, Daphna Weinshall

View PDF

Abstract:The clustering of unlabeled raw images is a daunting task, which has recently been approached with some success by deep learning methods. Here we propose an unsupervised clustering framework, which learns a deep neural network in an end-to-end fashion, providing direct cluster assignments of images without additional processing. Multi-Modal Deep Clustering (MMDC), trains a deep network to align its image embeddings with target points sampled from a Gaussian Mixture Model distribution. The cluster assignments are then determined by mixture component association of image embeddings. Simultaneously, the same deep network is trained to solve an additional self-supervised task of predicting image rotations. This pushes the network to learn more meaningful image representations that facilitate a better clustering. Experimental results show that MMDC achieves or exceeds state-of-the-art performance on six challenging benchmarks. On natural image datasets we improve on previous results with significant margins of up to 20% absolute accuracy points, yielding an accuracy of 82% on CIFAR-10, 45% on CIFAR-100 and 69% on STL-10.

Comments:	Accepted to ICPR 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1912.02678 [cs.CV]
	(or arXiv:1912.02678v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1912.02678

Submission history

From: Guy Shiran [view email]
[v1] Thu, 5 Dec 2019 16:03:43 UTC (1,485 KB)
[v2] Sun, 18 Oct 2020 18:14:37 UTC (3,632 KB)
[v3] Tue, 15 Dec 2020 10:16:11 UTC (4,619 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators