Text-to-Image Alignment in Denoising-Based Models through Step Selection

Grimal, Paul; Borgne, Hervé Le; Ferret, Olivier

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.17525 (cs)

[Submitted on 24 Apr 2025]

Title:Text-to-Image Alignment in Denoising-Based Models through Step Selection

Authors:Paul Grimal, Hervé Le Borgne, Olivier Ferret

View PDF HTML (experimental)

Abstract:Visual generative AI models often encounter challenges related to text-image alignment and reasoning limitations. This paper presents a novel method for selectively enhancing the signal at critical denoising steps, optimizing image generation based on input semantics. Our approach addresses the shortcomings of early-stage signal modifications, demonstrating that adjustments made at later stages yield superior results. We conduct extensive experiments to validate the effectiveness of our method in producing semantically aligned images on Diffusion and Flow Matching model, achieving state-of-the-art performance. Our results highlight the importance of a judicious choice of sampling stage to improve performance and overall image alignment.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.17525 [cs.CV]
	(or arXiv:2504.17525v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.17525

Submission history

From: Paul Grimal [view email]
[v1] Thu, 24 Apr 2025 13:10:32 UTC (18,118 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2025-04

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Text-to-Image Alignment in Denoising-Based Models through Step Selection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Text-to-Image Alignment in Denoising-Based Models through Step Selection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators