Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

Donhauser, Konstantin; Arnal, Charles; Pezeshki, Mohammad; Cabannes, Vivien; Lopez-Paz, David; Ahuja, Kartik

Computer Science > Computation and Language

arXiv:2502.09647 (cs)

[Submitted on 11 Feb 2025]

Title:Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

Authors:Konstantin Donhauser, Charles Arnal, Mohammad Pezeshki, Vivien Cabannes, David Lopez-Paz, Kartik Ahuja

View PDF HTML (experimental)

Abstract:The ability to process long contexts is crucial for many natural language processing tasks, yet it remains a significant challenge. While substantial progress has been made in enhancing the efficiency of attention mechanisms, there is still a gap in understanding how attention heads function in long-context settings. In this paper, we observe that while certain heads consistently attend to local information only, others swing between attending to local and long-context information depending on the query. This raises the question: can we identify which heads require long-context information to predict the next token accurately? We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys. The core idea here is to exploit a simple model for the long-context scores via second moment approximations. These findings unveil simple properties of attention in the context of long sequences, and open the door to potentially significant gains in efficiency.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2502.09647 [cs.CL]
	(or arXiv:2502.09647v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.09647

Submission history

From: Konstantin Donhauser [view email]
[v1] Tue, 11 Feb 2025 00:04:32 UTC (1,210 KB)

Computer Science > Computation and Language

Title:Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators