Energy Transformer

Hoover, Benjamin; Liang, Yuchen; Pham, Bao; Panda, Rameswar; Strobelt, Hendrik; Chau, Duen Horng; Zaki, Mohammed J.; Krotov, Dmitry

Computer Science > Machine Learning

arXiv:2302.07253v1 (cs)

[Submitted on 14 Feb 2023 (this version), latest version 1 Nov 2023 (v2)]

Title:Energy Transformer

Authors:Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov

View PDF

Abstract:Transformers have become the de facto models of choice in machine learning, typically leading to impressive performance on many applications. At the same time, the architectural development in the transformer world is mostly driven by empirical findings, and the theoretical understanding of their architectural building blocks is rather limited. In contrast, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, but have not yet demonstrated truly impressive practical results. We propose a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associative Memory model. Our novel architecture, called Energy Transformer (or ET for short), has many of the familiar architectural primitives that are often used in the current generation of transformers. However, it is not identical to the existing architectures. The sequence of transformer layers in ET is purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. As a consequence of this computational principle, the attention in ET is different from the conventional attention mechanism. In this work, we introduce the theoretical foundations of ET, explore it's empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection task.

Subjects:	Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Cite as:	arXiv:2302.07253 [cs.LG]
	(or arXiv:2302.07253v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.07253

Submission history

From: Benjamin Hoover [view email]
[v1] Tue, 14 Feb 2023 18:51:22 UTC (10,884 KB)
[v2] Wed, 1 Nov 2023 00:14:30 UTC (10,881 KB)

Computer Science > Machine Learning

Title:Energy Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Energy Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators