Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

Chen, Hongyu; Gao, Yiqi; Zhou, Min; Wang, Peng; Li, Xubin; Ge, Tiezheng; Zheng, Bo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.14768 (cs)

[Submitted on 23 Apr 2024]

Title:Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

Authors:Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng

View PDF HTML (experimental)

Abstract:Recently, integrating visual controls into text-to-image~(T2I) models, such as ControlNet method, has received significant attention for finer control capabilities. While various training-free methods make efforts to enhance prompt following in T2I models, the issue with visual control is still rarely studied, especially in the scenario that visual controls are misaligned with text prompts. In this paper, we address the challenge of ``Prompt Following With Visual Control" and propose a training-free approach named Mask-guided Prompt Following (MGPF). Object masks are introduced to distinct aligned and misaligned parts of visual controls and prompts. Meanwhile, a network, dubbed as Masked ControlNet, is designed to utilize these object masks for object generation in the misaligned visual control region. Further, to improve attribute matching, a simple yet efficient loss is designed to align the attention maps of attributes with object regions constrained by ControlNet and object masks. The efficacy and superiority of MGPF are validated through comprehensive quantitative and qualitative experiments.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.14768 [cs.CV]
	(or arXiv:2404.14768v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.14768

Submission history

From: Min Zhou [view email]
[v1] Tue, 23 Apr 2024 06:10:43 UTC (17,722 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators