Data value estimation on private gradients

Zhou, Zijian; Xu, Xinyi; Rus, Daniela; Low, Bryan Kian Hsiang

Computer Science > Machine Learning

arXiv:2412.17008 (cs)

[Submitted on 22 Dec 2024]

Title:Data value estimation on private gradients

Authors:Zijian Zhou, Xinyi Xu, Daniela Rus, Bryan Kian Hsiang Low

View PDF HTML (experimental)

Abstract:For gradient-based machine learning (ML) methods commonly adopted in practice such as stochastic gradient descent, the de facto differential privacy (DP) technique is perturbing the gradients with random Gaussian noise. Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP such as data pricing, collaborative ML, and federated learning (FL). Can existing data valuation methods still be used when DP is enforced via gradient perturbations? We show that the answer is no with the default approach of injecting i.i.d.~random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget, producing estimates almost like random guesses. To address this issue, we propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.~the budget. We also empirically demonstrate that our method gives better data value estimates on various ML tasks and is applicable to use cases including dataset valuation and~FL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2412.17008 [cs.LG]
	(or arXiv:2412.17008v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.17008

Submission history

From: Zijian Zhou [view email]
[v1] Sun, 22 Dec 2024 13:15:51 UTC (2,326 KB)

Computer Science > Machine Learning

Title:Data value estimation on private gradients

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Data value estimation on private gradients

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators