Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice

Hafner, Flavio; Sun, Chang

Computer Science > Machine Learning

arXiv:2411.12451 (cs)

[Submitted on 19 Nov 2024]

Title:Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice

Authors:Flavio Hafner, Chang Sun

View PDF HTML (experimental)

Abstract:Synthetic data generators, when trained using privacy-preserving techniques like differential privacy, promise to produce synthetic data with formal privacy guarantees, facilitating the sharing of sensitive data. However, it is crucial to empirically assess the privacy risks associated with the generated synthetic data before deploying generative technologies. This paper outlines the key concepts and assumptions underlying empirical privacy evaluation in machine learning-based generative and predictive models. Then, this paper explores the practical challenges for privacy evaluations of generative models for use cases with millions of training records, such as data from statistical agencies and healthcare providers. Our findings indicate that methods designed to verify the correct operation of the training algorithm are effective for large datasets, but they often assume an adversary that is unrealistic in many scenarios. Based on the findings, we highlight a crucial trade-off between the computational feasibility of the evaluation and the level of realism of the assumed threat model. Finally, we conclude with ideas and suggestions for future research.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2411.12451 [cs.LG]
	(or arXiv:2411.12451v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.12451

Submission history

From: Chang Sun [view email]
[v1] Tue, 19 Nov 2024 12:19:28 UTC (53 KB)

Computer Science > Machine Learning

Title:Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators