Breaking the Attention Bottleneck

Hilsenbek, Kalle

Computer Science > Machine Learning

arXiv:2406.10906 (cs)

[Submitted on 16 Jun 2024]

Title:Breaking the Attention Bottleneck

Authors:Kalle Hilsenbek

View PDF HTML (experimental)

Abstract:Attention-based transformers have become the standard architecture in many deep learning fields, primarily due to their ability to model long-range dependencies and handle variable-length input sequences. However, the attention mechanism with its quadratic complexity is a significant bottleneck in the transformer architecture. This algorithm is only uni-directional in the decoder and converges to a static pattern in over-parametrized decoder-only models. I address this issue by developing a generative function as attention or activation replacement. It still has the auto-regressive character by comparing each token with the previous one. In my test setting with nanoGPT this yields a smaller loss while having a smaller model. The loss further drops by incorporating an average context vector. This concept of attention replacement is distributed under the GNU AGPL v3 license at this https URL.

Comments:	6 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2406.10906 [cs.LG]
	(or arXiv:2406.10906v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.10906

Submission history

From: Kalle Hilsenbek [view email]
[v1] Sun, 16 Jun 2024 12:06:58 UTC (2,171 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2024-06

Change to browse by:

cs.CL
cs.LG

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Breaking the Attention Bottleneck

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Breaking the Attention Bottleneck

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators