Zero-Shot Object-Centric Representation Learning

Didolkar, Aniket; Zadaianchuk, Andrii; Goyal, Anirudh; Mozer, Mike; Bengio, Yoshua; Martius, Georg; Seitzer, Maximilian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.09162 (cs)

[Submitted on 17 Aug 2024]

Title:Zero-Shot Object-Centric Representation Learning

Authors:Aniket Didolkar, Andrii Zadaianchuk, Anirudh Goyal, Mike Mozer, Yoshua Bengio, Georg Martius, Maximilian Seitzer

View PDF HTML (experimental)

Abstract:The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2408.09162 [cs.CV]
	(or arXiv:2408.09162v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.09162

Submission history

From: Maximilian Seitzer [view email]
[v1] Sat, 17 Aug 2024 10:37:07 UTC (9,564 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Object-Centric Representation Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Object-Centric Representation Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators