Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Prabhudesai, Mihir; Goyal, Anirudh; Pathak, Deepak; Fragkiadaki, Katerina

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.03739 (cs)

This paper has been withdrawn by Mihir Prabhudesai

[Submitted on 5 Oct 2023 (v1), last revised 7 Nov 2024 (this version, v5)]

Title:Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Authors:Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki

No PDF available, click to view other formats

Abstract:Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at this https URL.

Comments:	This paper is subsumed by a later paper of ours: arXiv:2407.08737
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2310.03739 [cs.CV]
	(or arXiv:2310.03739v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.03739

Submission history

From: Mihir Prabhudesai [view email]
[v1] Thu, 5 Oct 2023 17:59:18 UTC (18,643 KB)
[v2] Sat, 22 Jun 2024 07:26:18 UTC (33,501 KB)
[v3] Mon, 28 Oct 2024 16:25:10 UTC (1 KB) (withdrawn)
[v4] Tue, 29 Oct 2024 03:53:33 UTC (1 KB) (withdrawn)
[v5] Thu, 7 Nov 2024 03:54:22 UTC (1 KB) (withdrawn)

Computer Science > Computer Vision and Pattern Recognition

Title:Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators