Measuring memorization in language models via probabilistic extraction

Hayes, Jamie; Swanberg, Marika; Chaudhari, Harsh; Yona, Itay; Shumailov, Ilia; Nasr, Milad; Choquette-Choo, Christopher A.; Lee, Katherine; Cooper, A. Feder

Computer Science > Machine Learning

arXiv:2410.19482 (cs)

[Submitted on 25 Oct 2024 (v1), last revised 20 Mar 2025 (this version, v3)]

Title:Measuring memorization in language models via probabilistic extraction

Authors:Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov, Milad Nasr, Christopher A. Choquette-Choo, Katherine Lee, A. Feder Cooper

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this issue: split a training example into a prefix and suffix, then prompt the LLM with the prefix, and deem the example extractable if the LLM generates the matching suffix using greedy sampling. This definition yields a yes-or-no determination of whether extraction was successful with respect to a single query. Though efficient to compute, we show that this definition is unreliable because it does not account for non-determinism present in more realistic (non-greedy) sampling schemes, for which LLMs produce a range of outputs for the same prompt. We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. We evaluate our probabilistic measure across different models, sampling schemes, and training-data repetitions, and find that this measure provides more nuanced information about extraction risk compared to traditional discoverable extraction.

Comments:	NAACL 25
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2410.19482 [cs.LG]
	(or arXiv:2410.19482v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.19482

Submission history

From: Jamie Hayes [view email]
[v1] Fri, 25 Oct 2024 11:37:04 UTC (480 KB)
[v2] Wed, 12 Mar 2025 14:25:10 UTC (2,680 KB)
[v3] Thu, 20 Mar 2025 15:35:56 UTC (2,680 KB)

Computer Science > Machine Learning

Title:Measuring memorization in language models via probabilistic extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Measuring memorization in language models via probabilistic extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators