A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty

Feng, Xiaohua; Li, Yuyuan; Wang, Chengye; Liu, Junlin; Zhang, Li; Chen, Chaochao

Computer Science > Machine Learning

arXiv:2504.06658 (cs)

[Submitted on 9 Apr 2025]

Title:A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty

Authors:Xiaohua Feng, Yuyuan Li, Chengye Wang, Junlin Liu, Li Zhang, Chaochao Chen

View PDF HTML (experimental)

Abstract:Driven by privacy protection laws and regulations, unlearning in Large Language Models (LLMs) is gaining increasing attention. However, current research often neglects the interpretability of the unlearning process, particularly concerning sample-level unlearning difficulty. Existing studies typically assume a uniform unlearning difficulty across samples. This simplification risks attributing the performance of unlearning algorithms to sample selection rather than the algorithm's design, potentially steering the development of LLM unlearning in the wrong direction. Thus, we investigate the relationship between LLM unlearning and sample characteristics, with a focus on unlearning difficulty. Drawing inspiration from neuroscience, we propose a Memory Removal Difficulty ($\mathrm{MRD}$) metric to quantify sample-level unlearning difficulty. Using $\mathrm{MRD}$, we analyze the characteristics of hard-to-unlearn versus easy-to-unlearn samples. Furthermore, we propose an $\mathrm{MRD}$-based weighted sampling method to optimize existing unlearning algorithms, which prioritizes easily forgettable samples, thereby improving unlearning efficiency and effectiveness. We validate the proposed metric and method using public benchmarks and datasets, with results confirming its effectiveness.

Comments:	16 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2504.06658 [cs.LG]
	(or arXiv:2504.06658v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.06658

Submission history

From: Xiaohua Feng [view email]
[v1] Wed, 9 Apr 2025 07:48:10 UTC (211 KB)

Computer Science > Machine Learning

Title:A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators