Enhancing Image Layout Control with Loss-Guided Diffusion Models

Patel, Zakaria; Serkh, Kirill

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.14101 (cs)

[Submitted on 23 May 2024 (v1), last revised 16 Sep 2024 (this version, v2)]

Title:Enhancing Image Layout Control with Loss-Guided Diffusion Models

Authors:Zakaria Patel, Kirill Serkh

View PDF HTML (experimental)

Abstract:Diffusion models are a powerful class of generative models capable of producing high-quality images from pure noise using a simple text prompt. While most methods which introduce additional spatial constraints into the generated images (e.g., bounding boxes) require fine-tuning, a smaller and more recent subset of these methods take advantage of the models' attention mechanism, and are training-free. These methods generally fall into one of two categories. The first entails modifying the cross-attention maps of specific tokens directly to enhance the signal in certain regions of the image. The second works by defining a loss function over the cross-attention maps, and using the gradient of this loss to guide the latent. While previous work explores these as alternative strategies, we provide an interpretation for these methods which highlights their complimentary features, and demonstrate that it is possible to obtain superior performance when both methods are used in concert.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2405.14101 [cs.CV]
	(or arXiv:2405.14101v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.14101

Submission history

From: Zakaria Patel [view email]
[v1] Thu, 23 May 2024 02:08:44 UTC (8,840 KB)
[v2] Mon, 16 Sep 2024 20:20:30 UTC (10,208 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing Image Layout Control with Loss-Guided Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing Image Layout Control with Loss-Guided Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators