Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Zhang, Tao; Da, Cheng; Ding, Kun; Jin, Kun; Li, Yan; Gao, Tingting; Zhang, Di; Xiang, Shiming; Pan, Chunhong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.01051 (cs)

[Submitted on 3 Feb 2025]

Title:Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Authors:Tao Zhang, Cheng Da, Kun Ding, Kun Jin, Yan Li, Tingting Gao, Di Zhang, Shiming Xiang, Chunhong Pan

View PDF HTML (experimental)

Abstract:Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically leverage Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we demonstrate that diffusion models are inherently well-suited for step-level reward modeling in the latent space, as they can naturally extract features from noisy latent images. Accordingly, we propose the Latent Reward Model (LRM), which repurposes components of diffusion models to predict preferences of latent images at various timesteps. Building on LRM, we introduce Latent Preference Optimization (LPO), a method designed for step-level preference optimization directly in the latent space. Experimental results indicate that LPO not only significantly enhances performance in aligning diffusion models with general, aesthetic, and text-image alignment preferences, but also achieves 2.5-28$\times$ training speedup compared to existing preference optimization methods. Our code will be available at this https URL.

Comments:	20 pages, 14 tables, 15 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.01051 [cs.CV]
	(or arXiv:2502.01051v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.01051

Submission history

From: Tao Zhang [view email]
[v1] Mon, 3 Feb 2025 04:51:28 UTC (42,062 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators