Quantifying the Ease of Reproducing Training Data in Unconditional Diffusion Models

Hasegawa, Masaya; Yasuda, Koji

Computer Science > Machine Learning

arXiv:2503.19429 (cs)

[Submitted on 25 Mar 2025]

Title:Quantifying the Ease of Reproducing Training Data in Unconditional Diffusion Models

Authors:Masaya Hasegawa, Koji Yasuda

View PDF

Abstract:Diffusion models, which have been advancing rapidly in recent years, may generate samples that closely resemble the training data. This phenomenon, known as memorization, may lead to copyright issues. In this study, we propose a method to quantify the ease of reproducing training data in unconditional diffusion models. The average of a sample population following the Langevin equation in the reverse diffusion process moves according to a first-order ordinary differential equation (ODE). This ODE establishes a 1-to-1 correspondence between images and their noisy counterparts in the latent space. Since the ODE is reversible and the initial noisy images are sampled randomly, the volume of an image's projected area represents the probability of generating those images. We examined the ODE, which projects images to latent space, and succeeded in quantifying the ease of reproducing training data by measuring the volume growth rate in this process. Given the relatively low computational complexity of this method, it allows us to enhance the quality of training data by detecting and modifying the easily memorized training samples.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.19429 [cs.LG]
	(or arXiv:2503.19429v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2503.19429

Submission history

From: Masaya Hasegawa [view email]
[v1] Tue, 25 Mar 2025 08:19:56 UTC (2,027 KB)

Computer Science > Machine Learning

Title:Quantifying the Ease of Reproducing Training Data in Unconditional Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Quantifying the Ease of Reproducing Training Data in Unconditional Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators