FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention

Gupta, Ahan; Guo, Hao; Yuan, Yueming; Zhou, Yanqi; Mendis, Charith

Computer Science > Machine Learning

arXiv:2306.15799 (cs)

[Submitted on 27 Jun 2023 (v1), last revised 2 Jun 2024 (this version, v2)]

Title:FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention

Authors:Ahan Gupta, Hao Guo, Yueming Yuan, Yanqi Zhou, Charith Mendis

View PDF HTML (experimental)

Abstract:Many efficient $\textit{approximate}$ self-attention techniques have become prevalent since the inception of the transformer architecture. Two popular classes of these techniques are low-rank and kernel methods. Each of these methods has its strengths. We observe these strengths synergistically complement each other and exploit them to fuse low-rank and kernel methods, producing a new class of transformers: FLuRKA ($\textbf{F}$ast $\textbf{L}$ow-$\textbf{R}$ank & $\textbf{K}$ernel$ \textbf{A}$ttention). FLuRKA are highly $\textit{training-efficient}$ with faster model speeds $\textit{and}$ similar model qualities compared to constituent low-rank and kernel methods. We theoretically and empirically evaluate the speed and quality of FLuRKA. Our model speed analysis posits a variety of parameter configurations where FLuRKA exhibit speedups over low-rank and kernel approximations and our model quality analysis bounds the error of FLuRKA with respect to full-attention. Empirically, we instantiate three FLuRKA variants which experience speedups of up to 3.3x and 1.7x over low-rank and kernel methods respectively. This translates to speedups of up to 20x over models with flash-attention. Across a diverse set of tasks spanning language modeling, language understanding, long sequence modeling, machine translation, and image classification, FLuRKA achieve comparable accuracy with underlying low-rank and kernel approximations, occasionally surpassing both.

Comments:	21 pages, 5 figures
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Performance (cs.PF)
Cite as:	arXiv:2306.15799 [cs.LG]
	(or arXiv:2306.15799v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.15799

Submission history

From: Ahan Gupta [view email]
[v1] Tue, 27 Jun 2023 20:58:41 UTC (403 KB)
[v2] Sun, 2 Jun 2024 13:49:32 UTC (271 KB)

Computer Science > Machine Learning

Title:FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators