MultiBooth: Towards Generating All Your Concepts in an Image from Text

Zhu, Chenyang; Li, Kai; Ma, Yue; He, Chunming; Li, Xiu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.14239 (cs)

[Submitted on 22 Apr 2024 (v1), last revised 17 Dec 2024 (this version, v2)]

Title:MultiBooth: Towards Generating All Your Concepts in an Image from Text

Authors:Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Xiu Li

View PDF HTML (experimental)

Abstract:This paper introduces MultiBooth, a novel and efficient technique for multi-concept customization in image generation from text. Despite the significant advancements in customized generation methods, particularly with the success of diffusion models, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. During the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map. This method enables the creation of individual concepts within their specified regions, thereby facilitating the formation of multi-concept images. This strategy not only improves concept fidelity but also reduces additional inference cost. MultiBooth surpasses various baselines in both qualitative and quantitative evaluations, showcasing its superior performance and computational efficiency. Project Page: this https URL

Comments:	To be published in AAAI 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.14239 [cs.CV]
	(or arXiv:2404.14239v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.14239

Submission history

From: Chenyang Zhu [view email]
[v1] Mon, 22 Apr 2024 14:47:54 UTC (25,521 KB)
[v2] Tue, 17 Dec 2024 04:47:44 UTC (24,259 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MultiBooth: Towards Generating All Your Concepts in an Image from Text

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MultiBooth: Towards Generating All Your Concepts in an Image from Text

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators