E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Tan, Zhiyu; Qian, WenXu; Chen, Hesen; Yang, Mengping; Chen, Lei; Li, Hao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.21044 (cs)

[Submitted on 30 Dec 2024]

Title:E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Authors:Zhiyu Tan, WenXu Qian, Hesen Chen, Mengping Yang, Lei Chen, Hao Li

View PDF HTML (experimental)

Abstract:Diffusion models have emerged as a powerful framework for generative modeling, achieving state-of-the-art performance across various tasks. However, they face several inherent limitations, including a training-sampling gap, information leakage in the progressive noising process, and the inability to incorporate advanced loss functions like perceptual and adversarial losses during training. To address these challenges, we propose an innovative end-to-end training framework that aligns the training and sampling processes by directly optimizing the final reconstruction output. Our method eliminates the training-sampling gap, mitigates information leakage by treating the training process as a direct mapping from pure noise to the target data distribution, and enables the integration of perceptual and adversarial losses into the objective. Extensive experiments on benchmarks such as COCO30K and HW30K demonstrate that our approach consistently outperforms traditional diffusion models, achieving superior results in terms of FID and CLIP score, even with reduced sampling steps. These findings highlight the potential of end-to-end training to advance diffusion-based generative models toward more robust and efficient solutions.

Comments:	technical report, to be further updated
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.21044 [cs.CV]
	(or arXiv:2412.21044v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.21044

Submission history

From: Mengping Yang [view email]
[v1] Mon, 30 Dec 2024 16:06:31 UTC (13,415 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators