Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Wei, Fanyue; Zeng, Wei; Li, Zhenyang; Yin, Dawei; Duan, Lixin; Li, Wen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.06642 (cs)

[Submitted on 9 Jul 2024 (v1), last revised 18 Jul 2024 (this version, v2)]

Title:Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Authors:Fanyue Wei, Wei Zeng, Zhenyang Li, Dawei Yin, Lixin Duan, Wen Li

View PDF HTML (experimental)

Abstract:Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: \url{this https URL}.

Comments:	Accepted by ECCV 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.06642 [cs.CV]
	(or arXiv:2407.06642v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.06642

Submission history

From: Fanyue Wei [view email]
[v1] Tue, 9 Jul 2024 08:11:53 UTC (35,924 KB)
[v2] Thu, 18 Jul 2024 15:34:04 UTC (35,924 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators