Memory Layers at Scale

Berges, Vincent-Pierre; Oğuz, Barlas; Haziza, Daniel; Yih, Wen-tau; Zettlemoyer, Luke; Ghosh, Gargi

Computer Science > Computation and Language

arXiv:2412.09764 (cs)

[Submitted on 12 Dec 2024 (v1), last revised 20 Dec 2024 (this version, v2)]

Title:Memory Layers at Scale

Authors:Vincent-Pierre Berges, Barlas Oğuz, Daniel Haziza, Wen-tau Yih, Luke Zettlemoyer, Gargi Ghosh

View PDF HTML (experimental)

Abstract:Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the computation budget, as well as mixture-of-expert models when matched for both compute and parameters. We find gains are especially pronounced for factual tasks. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.09764 [cs.CL]
	(or arXiv:2412.09764v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.09764

Submission history

From: Barlas Oguz [view email]
[v1] Thu, 12 Dec 2024 23:56:57 UTC (393 KB)
[v2] Fri, 20 Dec 2024 17:36:52 UTC (393 KB)

Computer Science > Computation and Language

Title:Memory Layers at Scale

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Memory Layers at Scale

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators