ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Hao, Shaozhe; Han, Kai; Zhao, Shihao; Wong, Kwan-Yee K.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.00971v1 (cs)

[Submitted on 1 Jun 2023 (this version), latest version 7 Dec 2023 (v2)]

Title:ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Authors:Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong

View PDF

Abstract:Personalized text-to-image generation using diffusion models has recently been proposed and attracted lots of attention. Given a handful of images containing a novel concept (e.g., a unique toy), we aim to tune the generative model to capture fine visual details of the novel concept and generate photorealistic images following a text condition. We present a plug-in method, named ViCo, for fast and lightweight personalized generation. Specifically, we propose an image attention module to condition the diffusion process on the patch-wise visual semantics. We introduce an attention-based object mask that comes almost at no cost from the attention module. In addition, we design a simple regularization based on the intrinsic properties of text-image attention maps to alleviate the common overfitting degradation. Unlike many existing models, our method does not finetune any parameters of the original diffusion model. This allows more flexible and transferable model deployment. With only light parameter training (~6% of the diffusion U-Net), our method achieves comparable or even better performance than all state-of-the-art models both qualitatively and quantitatively.

Comments:	Under review
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.00971 [cs.CV]
	(or arXiv:2306.00971v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.00971

Submission history

From: Shaozhe Hao [view email]
[v1] Thu, 1 Jun 2023 17:58:44 UTC (47,345 KB)
[v2] Thu, 7 Dec 2023 17:49:30 UTC (43,385 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators