Locating and Editing Factual Associations in GPT

Meng, Kevin; Bau, David; Andonian, Alex; Belinkov, Yonatan

Computer Science > Computation and Language

arXiv:2202.05262 (cs)

[Submitted on 10 Feb 2022 (v1), last revised 13 Jan 2023 (this version, v5)]

Title:Locating and Editing Factual Associations in GPT

Authors:Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov

View PDF

Abstract:We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at this https URL

Comments:	NeurIPS 2022. 35 pages, 30 figures. Code and data at this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
ACM classes:	I.2.7
Cite as:	arXiv:2202.05262 [cs.CL]
	(or arXiv:2202.05262v5 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2202.05262

Submission history

From: David Bau [view email]
[v1] Thu, 10 Feb 2022 18:59:54 UTC (4,847 KB)
[v2] Mon, 21 Mar 2022 15:13:09 UTC (5,036 KB)
[v3] Wed, 1 Jun 2022 18:56:44 UTC (6,027 KB)
[v4] Sun, 23 Oct 2022 18:07:20 UTC (7,314 KB)
[v5] Fri, 13 Jan 2023 15:16:16 UTC (7,314 KB)

Computer Science > Computation and Language

Title:Locating and Editing Factual Associations in GPT

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Locating and Editing Factual Associations in GPT

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators