Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Hua, Ermo; Jiang, Che; Lv, Xingtai; Zhang, Kaiyan; Ding, Ning; Sun, Youbang; Qi, Biqing; Fan, Yuchen; Zhu, Xuekai; Zhou, Bowen

Computer Science > Artificial Intelligence

arXiv:2412.17739 (cs)

[Submitted on 23 Dec 2024 (v1), last revised 2 Jan 2025 (this version, v2)]

Title:Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Authors:Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xuekai Zhu, Bowen Zhou

View PDF HTML (experimental)

Abstract:Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend. While existing works mainly address RoPE's limitations within attention mechanism, this paper provides an analysis across nearly all parts of LMs, uncovering their adverse effects on length generalization for RoPE-based attention. Using Discrete Signal Processing theory, we show that RoPE enables periodic attention by implicitly achieving Non-Uniform Discrete Fourier Transform. However, this periodicity is undermined by the spectral damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we propose Fourier Position Embedding (FoPE), which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization. FoPE constructs Fourier Series and zero-outs the destructive frequency components, increasing model robustness against the spectrum damage. Experiments across various model scales show that, within varying context windows, FoPE can maintain a more stable perplexity and a more consistent accuracy in a needle-in-haystack task compared to RoPE and ALiBi. Several analyses and ablations bring further support to our method and theoretical modeling.

Comments:	14 pages, 7 figures
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2412.17739 [cs.AI]
	(or arXiv:2412.17739v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2412.17739

Submission history

From: Ermo Hua [view email]
[v1] Mon, 23 Dec 2024 17:44:01 UTC (526 KB)
[v2] Thu, 2 Jan 2025 08:58:38 UTC (527 KB)

Computer Science > Artificial Intelligence

Title:Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators