Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Li, Yingcong; Tarzanagh, Davoud Ataee; Rawat, Ankit Singh; Fazel, Maryam; Oymak, Samet

Computer Science > Machine Learning

arXiv:2504.04308 (cs)

[Submitted on 6 Apr 2025]

Title:Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Authors:Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat, Maryam Fazel, Samet Oymak

View PDF HTML (experimental)

Abstract:Linear attention methods offer a compelling alternative to softmax attention due to their efficiency in recurrent decoding. Recent research has focused on enhancing standard linear attention by incorporating gating while retaining its computational benefits. Such Gated Linear Attention (GLA) architectures include competitive models such as Mamba and RWKV. In this work, we investigate the in-context learning capabilities of the GLA model and make the following contributions. We show that a multilayer GLA can implement a general class of Weighted Preconditioned Gradient Descent (WPGD) algorithms with data-dependent weights. These weights are induced by the gating mechanism and the input, enabling the model to control the contribution of individual tokens to prediction. To further understand the mechanics of this weighting, we introduce a novel data model with multitask prompts and characterize the optimization landscape of learning a WPGD algorithm. Under mild conditions, we establish the existence and uniqueness (up to scaling) of a global minimum, corresponding to a unique WPGD solution. Finally, we translate these findings to explore the optimization landscape of GLA and shed light on how gating facilitates context-aware learning and when it is provably better than vanilla linear attention.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC)
Cite as:	arXiv:2504.04308 [cs.LG]
	(or arXiv:2504.04308v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.04308

Submission history

From: Yingcong Li [view email]
[v1] Sun, 6 Apr 2025 00:37:36 UTC (930 KB)

Computer Science > Machine Learning

Title:Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators