Fast Gradient Computation for RoPE Attention in Almost Linear Time

Chen, Yifang; Huo, Jiayan; Li, Xiaoyu; Liang, Yingyu; Shi, Zhenmei; Song, Zhao

Computer Science > Machine Learning

arXiv:2412.17316 (cs)

[Submitted on 23 Dec 2024]

Title:Fast Gradient Computation for RoPE Attention in Almost Linear Time

Authors:Yifang Chen, Jiayan Huo, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

View PDF HTML (experimental)

Abstract:The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information. However, the RoPE mechanisms make the computations of attention mechanisms more complicated, which makes efficient algorithms challenging. Earlier research introduced almost linear time, i.e., $n^{1+o(1)}$ where $n$ is the number of input tokens, algorithms for the forward computation under specific parameter settings. However, achieving a subquadratic time algorithm for other parameter regimes remains impossible unless the widely accepted Strong Exponential Time Hypothesis (SETH) is disproven. In this work, we develop the first almost linear time algorithm for backward computations in the RoPE-based attention under bounded entries. Our approach builds on recent advancements in fast RoPE attention computations, utilizing a novel combination of the polynomial method and the Fast Fourier Transform. Furthermore, we show that with lower bounds derived from the SETH, the bounded entry condition is necessary for subquadratic performance.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Computation and Language (cs.CL)
Cite as:	arXiv:2412.17316 [cs.LG]
	(or arXiv:2412.17316v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.17316

Submission history

From: Zhenmei Shi [view email]
[v1] Mon, 23 Dec 2024 06:20:22 UTC (24 KB)

Computer Science > Machine Learning

Title:Fast Gradient Computation for RoPE Attention in Almost Linear Time

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Fast Gradient Computation for RoPE Attention in Almost Linear Time

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators