ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Deng, Jiaxin; Wang, Shiyao; Lu, Song; Li, Yinfeng; Luo, Xinchen; Liu, Yuanjun; Xu, Peixing; Zhou, Guorui

Computer Science > Artificial Intelligence

arXiv:2408.09380v3 (cs)

This paper has been withdrawn by Jiaxin Deng

[Submitted on 18 Aug 2024 (v1), revised 6 Nov 2024 (this version, v3), latest version 12 Feb 2025 (v4)]

Title:ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Authors:Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun Liu, Peixing Xu, Guorui Zhou

No PDF available, click to view other formats

Abstract:State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time complexity and decoupling model capacity from computational cost. Specifically, ELASTIC introduces a fixed length interest experts with linear dispatcher attention mechanism which compresses the long-term behaviour sequences to a significantly more compact representation which reduces up to 90% GPU memory usage with x2.7 inference speed up. The proposed linear dispatcher attention mechanism significantly reduces the quadratic complexity and makes the model feasible for adequately modeling extremely long sequences. Moreover, in order to retain the capacity for modeling various user interests, ELASTIC initializes a vast learnable interest memory bank and sparsely retrieves compressed user's interests from the memory with a negligible computational overhead. The proposed interest memory retrieval technique significantly expands the cardinality of available interest space while keeping the same computational cost, thereby striking a trade-off between recommendation accuracy and efficiency. To validate the effectiveness of our proposed ELASTIC, we conduct extensive experiments on various public datasets and compare it with several strong sequential recommenders. Experimental results demonstrate that ELASTIC consistently outperforms baselines by a significant margin and also highlight the computational efficiency of ELASTIC when modeling long sequences. We will make our implementation code publicly available.

Comments:	We hereby withdraw this paper from arXiv due to incomplete experiments. Upon further review, we have determined that additional experimental work is necessary to fully validate our findings and conclusions
Subjects:	Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2408.09380 [cs.AI]
	(or arXiv:2408.09380v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2408.09380

Submission history

From: Jiaxin Deng [view email]
[v1] Sun, 18 Aug 2024 06:41:46 UTC (6,275 KB)
[v2] Tue, 20 Aug 2024 13:24:50 UTC (1 KB) (withdrawn)
[v3] Wed, 6 Nov 2024 02:26:07 UTC (1 KB) (withdrawn)
[v4] Wed, 12 Feb 2025 04:00:41 UTC (6,275 KB)

Computer Science > Artificial Intelligence

Title:ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators