Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Akan, Adil Kaan; Yemez, Yucel

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.15878 (cs)

[Submitted on 27 Jan 2025 (v1), last revised 28 Jan 2025 (this version, v2)]

Title:Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Authors:Adil Kaan Akan, Yucel Yemez

View PDF HTML (experimental)

Abstract:We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to other slot-based generative methods in the literature. The project page can be found at this https URL.

Comments:	Accepted to ICLR2025. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2501.15878 [cs.CV]
	(or arXiv:2501.15878v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.15878

Submission history

From: Adil Kaan Akan [view email]
[v1] Mon, 27 Jan 2025 09:03:34 UTC (13,688 KB)
[v2] Tue, 28 Jan 2025 08:33:41 UTC (13,688 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators