Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Benarous, Elior; Du, Yilun; Yang, Heng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.12833 (cs)

[Submitted on 17 Apr 2025]

Title:Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Authors:Elior Benarous, Yilun Du, Heng Yang

View PDF HTML (experimental)

Abstract:We present a novel approach to training specialized instruction-based image-editing diffusion models, addressing key challenges in structural preservation with input images and semantic alignment with user prompts. We introduce an online reinforcement learning framework that aligns the diffusion model with human preferences without relying on extensive human annotations or curating a large dataset. Our method significantly improves the realism and alignment with instructions in two ways. First, the proposed models achieve precise and structurally coherent modifications in complex scenes while maintaining high fidelity in instruction-irrelevant areas. Second, they capture fine nuances in the desired edit by leveraging a visual prompt, enabling detailed control over visual edits without lengthy textual prompts. This approach simplifies users' efforts to achieve highly specific edits, requiring only 5 reference images depicting a certain concept for training. Experimental results demonstrate that our models can perform intricate edits in complex scenes, after just 10 training steps. Finally, we showcase the versatility of our method by applying it to robotics, where enhancing the visual realism of simulated environments through targeted sim-to-real image edits improves their utility as proxies for real-world settings.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2504.12833 [cs.CV]
	(or arXiv:2504.12833v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.12833

Submission history

From: Elior Benarous [view email]
[v1] Thu, 17 Apr 2025 10:46:39 UTC (12,005 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators