MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Chou, Yuhong; Yao, Man; Wang, Kexin; Pan, Yuqi; Zhu, Ruijie; Zhong, Yiran; Qiao, Yu; Wu, Jibin; Xu, Bo; Li, Guoqi

Computer Science > Machine Learning

arXiv:2411.10741 (cs)

[Submitted on 16 Nov 2024]

Title:MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Authors:Yuhong Chou, Man Yao, Kexin Wang, Yuqi Pan, Ruijie Zhu, Yiran Zhong, Yu Qiao, Jibin Wu, Bo Xu, Guoqi Li

View PDF HTML (experimental)

Abstract:Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: 1) Dynamic memory ability; 2) Static approximation ability; 3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.10741 [cs.LG]
	(or arXiv:2411.10741v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.10741

Submission history

From: Yuhong Chou [view email]
[v1] Sat, 16 Nov 2024 08:47:32 UTC (742 KB)

Computer Science > Machine Learning

Title:MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators