Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Chua, Lynn; Ghazi, Badih; Huang, Yangsibo; Kamath, Pritish; Kumar, Ravi; Liu, Daogao; Manurangsi, Pasin; Sinha, Amer; Zhang, Chiyuan

Computer Science > Computation and Language

arXiv:2406.14322v3 (cs)

[Submitted on 20 Jun 2024 (v1), last revised 16 Aug 2024 (this version, v3)]

Title:Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Authors:Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Comments:	Published as a conference paper at COLM 2024
Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2406.14322 [cs.CL]
	(or arXiv:2406.14322v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.14322

Submission history

From: Yangsibo Huang [view email]
[v1] Thu, 20 Jun 2024 13:54:32 UTC (1,263 KB)
[v2] Wed, 3 Jul 2024 14:05:20 UTC (1,263 KB)
[v3] Fri, 16 Aug 2024 15:02:45 UTC (1,326 KB)

Computer Science > Computation and Language

Title:Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators