Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

Lin, Chris; Lu, Mingyu; Kim, Chanwoo; Lee, Su-In

Computer Science > Machine Learning

arXiv:2407.03153 (cs)

[Submitted on 9 Jun 2024]

Title:Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

Authors:Chris Lin, Mingyu Lu, Chanwoo Kim, Su-In Lee

View PDF HTML (experimental)

Abstract:As diffusion models are deployed in real-world settings, data attribution is needed to ensure fair acknowledgment for contributors of high-quality training data and to identify sources of harmful content. Previous work focuses on identifying individual training samples important for the generation of a given image. However, instead of focusing on a given generated image, some use cases require understanding global properties of the distribution learned by a diffusion model (e.g., demographic diversity). Furthermore, training data for diffusion models are often contributed in groups rather than separately (e.g., multiple artworks from the same artist). Hence, here we tackle the problem of attributing global properties of diffusion models to groups of training data. Specifically, we develop a method to efficiently estimate Shapley values by leveraging model pruning and fine-tuning. We empirically demonstrate the utility of our method with three use cases: (i) global image quality for a DDPM trained on a CIFAR dataset, (ii) demographic diversity for an LDM trained on CelebA-HQ, and (iii) overall aesthetic quality for a Stable Diffusion model LoRA-finetuned on Post-Impressionist artworks.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.03153 [cs.LG]
	(or arXiv:2407.03153v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.03153

Submission history

From: MingYu Lu [view email]
[v1] Sun, 9 Jun 2024 17:42:09 UTC (28,724 KB)

Computer Science > Machine Learning

Title:Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators