Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Fernandez, Virginia; Sanchez, Pedro; Pinaya, Walter Hugo Lopez; Jacenków, Grzegorz; Tsaftaris, Sotirios A.; Cardoso, Jorge

Computer Science > Machine Learning

arXiv:2306.01322 (cs)

[Submitted on 2 Jun 2023]

Title:Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Authors:Virginia Fernandez, Pedro Sanchez, Walter Hugo Lopez Pinaya, Grzegorz Jacenków, Sotirios A. Tsaftaris, Jorge Cardoso

View PDF

Abstract:Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a multimodal generative model. A question that immediately arises is ``How can a data provider ensure that the generative model is not leaking identifiable information about a patient?''. Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with privacy distillation can effectively reduce re-identification risk whilst maintaining downstream performance.

Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2306.01322 [cs.LG]
	(or arXiv:2306.01322v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.01322

Submission history

From: Pedro Sanchez [view email]
[v1] Fri, 2 Jun 2023 07:44:00 UTC (2,912 KB)

Computer Science > Machine Learning

Title:Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators