WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Wang, Peng; Li, Zexi; Zhang, Ningyu; Xu, Ziwen; Yao, Yunzhi; Jiang, Yong; Xie, Pengjun; Huang, Fei; Chen, Huajun

Computer Science > Computation and Language

arXiv:2405.14768 (cs)

[Submitted on 23 May 2024 (v1), last revised 7 Oct 2024 (this version, v2)]

Title:WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Authors:Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

View PDF HTML (experimental)

Abstract:Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle -- reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories. In WISE, we design a dual parametric memory scheme, which consists of the main memory for the pretrained knowledge and a side memory for the edited knowledge. We only edit the knowledge in the side memory and train a router to decide which memory to go through when given a query. For continual editing, we devise a knowledge-sharding mechanism where different sets of edits reside in distinct subspaces of parameters, and are subsequently merged into a shared memory without conflicts. Extensive experiments show that WISE can outperform previous model editing methods and overcome the impossible triangle under lifelong model editing of question answering, hallucination, and out-of-distribution settings across trending LLM architectures, e.g., GPT, LLaMA, and Mistral. Code is available at this https URL.

Comments:	NeurIPS 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2405.14768 [cs.CL]
	(or arXiv:2405.14768v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.14768

Submission history

From: Ningyu Zhang [view email]
[v1] Thu, 23 May 2024 16:35:52 UTC (1,403 KB)
[v2] Mon, 7 Oct 2024 14:35:14 UTC (1,426 KB)

Computer Science > Computation and Language

Title:WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators