Self-attention as an attractor network: transient memories without backpropagation

D'Amico, Francesco; Negri, Matteo

Computer Science > Machine Learning

arXiv:2409.16112 (cs)

[Submitted on 24 Sep 2024]

Title:Self-attention as an attractor network: transient memories without backpropagation

Authors:Francesco D'Amico, Matteo Negri

View PDF HTML (experimental)

Abstract:Transformers are one of the most successful architectures of modern neural networks. At their core there is the so-called attention mechanism, which recently interested the physics community as it can be written as the derivative of an energy function in certain cases: while it is possible to write the cross-attention layer as a modern Hopfield network, the same is not possible for the self-attention, which is used in the GPT architectures and other autoregressive models. In this work we show that it is possible to obtain the self-attention layer as the derivative of local energy terms, which resemble a pseudo-likelihood. We leverage the analogy with pseudo-likelihood to design a recurrent model that can be trained without backpropagation: the dynamics shows transient states that are strongly correlated with both train and test examples. Overall we present a novel framework to interpret self-attention as an attractor network, potentially paving the way for new theoretical approaches inspired from physics to understand transformers.

Subjects:	Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Cite as:	arXiv:2409.16112 [cs.LG]
	(or arXiv:2409.16112v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.16112

Submission history

From: Francesco D'Amico [view email]
[v1] Tue, 24 Sep 2024 14:19:56 UTC (83 KB)

Computer Science > Machine Learning

Title:Self-attention as an attractor network: transient memories without backpropagation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Self-attention as an attractor network: transient memories without backpropagation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators