Multi-matrix Factorization Attention

Hu, Jingcheng; Li, Houyi; Zhang, Yinmin; Wang, Zili; Zhou, Shuigeng; Zhang, Xiangyu; Shum, Heung-Yeung

Computer Science > Machine Learning

arXiv:2412.19255 (cs)

[Submitted on 26 Dec 2024]

Title:Multi-matrix Factorization Attention

Authors:Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum

View PDF HTML (experimental)

Abstract:We propose novel attention architectures, Multi-matrix Factorization Attention (MFA) and MFA-Key-Reuse (MFA-KR). Existing variants for standard Multi-Head Attention (MHA), including SOTA methods like MLA, fail to maintain as strong performance under stringent Key-Value cache (KV cache) constraints. MFA enhances model capacity by efficiently scaling up both the number and dimension of attention heads through low-rank matrix factorization in the Query-Key (QK) circuit. Extending MFA, MFA-KR further reduces memory requirements by repurposing the key cache as value through value projection re-parameterization. MFA's design enables strong model capacity when working under tight KV cache budget, while MFA-KR is suitable for even harsher KV cache limits with minor performance trade-off. Notably, in our extensive and large-scale experiments, the proposed architecture outperforms MLA and performs comparably to MHA, while reducing KV cache usage by up to 56% and 93.7%, respectively.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2412.19255 [cs.LG]
	(or arXiv:2412.19255v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.19255

Submission history

From: Jingcheng Hu [view email]
[v1] Thu, 26 Dec 2024 15:45:45 UTC (7,174 KB)

Computer Science > Machine Learning

Title:Multi-matrix Factorization Attention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Multi-matrix Factorization Attention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators