Understanding Factual Recall in Transformers via Associative Memories

Nichani, Eshaan; Lee, Jason D.; Bietti, Alberto

Computer Science > Machine Learning

arXiv:2412.06538 (cs)

[Submitted on 9 Dec 2024]

Title:Understanding Factual Recall in Transformers via Associative Memories

Authors:Eshaan Nichani, Jason D. Lee, Alberto Bietti

View PDF HTML (experimental)

Abstract:Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate proportional to their parameter count. In our work, we show that shallow transformers can use a combination of associative memories to obtain such near optimal storage capacity. We begin by proving that the storage capacities of both linear and MLP associative memories scale linearly with parameter count. We next introduce a synthetic factual recall task, and prove that a transformer with a single layer of self-attention followed by an MLP can obtain 100% accuracy on the task whenever either the total number of self-attention parameters or MLP parameters scales (up to log factors) linearly with the number of facts. In particular, the transformer can trade off between using the value matrices or the MLP as an associative memory to store the dataset of facts. We complement these expressivity results with an analysis of the gradient flow trajectory of a simplified linear attention model trained on our factual recall task, where we show that the model exhibits sequential learning behavior.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Information Theory (cs.IT); Machine Learning (stat.ML)
Cite as:	arXiv:2412.06538 [cs.LG]
	(or arXiv:2412.06538v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.06538

Submission history

From: Eshaan Nichani [view email]
[v1] Mon, 9 Dec 2024 14:48:14 UTC (532 KB)

Computer Science > Machine Learning

Title:Understanding Factual Recall in Transformers via Associative Memories

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding Factual Recall in Transformers via Associative Memories

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators