Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Katz, Shahar; Wolf, Lior

Computer Science > Computation and Language

arXiv:2412.17019 (cs)

[Submitted on 22 Dec 2024]

Title:Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Authors:Shahar Katz, Lior Wolf

View PDF HTML (experimental)

Abstract:The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of LMs, the backward pass of attention has been largely overlooked. In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as "Reversed Attention". We examine the properties of Reversed Attention and demonstrate its ability to elucidate the models' behavior and edit dynamics. In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model's weights, using a novel method called "attention patching". In addition to enhancing the comprehension of how LM configure attention layers during backpropagation, Reversed Attention maps contribute to a more interpretable backward pass.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.17019 [cs.CL]
	(or arXiv:2412.17019v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.17019

Submission history

From: Shahar Katz [view email]
[v1] Sun, 22 Dec 2024 13:48:04 UTC (3,133 KB)

Computer Science > Computation and Language

Title:Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators