Arithmetic with Language Models: from Memorization to Computation

Maltoni, Davide; Ferrara, Matteo

doi:10.1016/j.neunet.2024.106550

Computer Science > Artificial Intelligence

arXiv:2308.01154 (cs)

[Submitted on 2 Aug 2023 (v1), last revised 2 Aug 2024 (this version, v4)]

Title:Arithmetic with Language Models: from Memorization to Computation

Authors:Davide Maltoni, Matteo Ferrara

View PDF HTML (experimental)

Abstract:A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Comments:	The article has been accepted for publication in Elsevier Neural Networks journal. The final version is available on the Elsevier ScienceDirect platform
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2308.01154 [cs.AI]
	(or arXiv:2308.01154v4 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2308.01154
Journal reference:	\Neural Networks, vol. 179, 2024
Related DOI:	https://doi.org/10.1016/j.neunet.2024.106550

Submission history

From: Matteo Ferrara [view email]
[v1] Wed, 2 Aug 2023 13:58:37 UTC (728 KB)
[v2] Thu, 25 Jan 2024 10:04:49 UTC (1,403 KB)
[v3] Wed, 6 Mar 2024 09:39:16 UTC (1,141 KB)
[v4] Fri, 2 Aug 2024 12:39:17 UTC (1,162 KB)

Computer Science > Artificial Intelligence

Title:Arithmetic with Language Models: from Memorization to Computation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Arithmetic with Language Models: from Memorization to Computation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators