Self-training Language Models for Arithmetic Reasoning

Kadlčík, Marek; Štefánik, Michal

Computer Science > Computation and Language

arXiv:2407.08400 (cs)

[Submitted on 11 Jul 2024]

Title:Self-training Language Models for Arithmetic Reasoning

Authors:Marek Kadlčík, Michal Štefánik

View PDF HTML (experimental)

Abstract:Language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving the capabilities of language models without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). We find that models can substantially improve in both single-round (offline) and online self-training. In the offline setting, supervised methods are able to deliver gains comparable to preference optimization, but in online self-training, preference optimization shows to largely outperform supervised training thanks to superior stability and robustness on unseen types of problems.

Comments:	Appeared in ICLR 2024 LLMAgents
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.08400 [cs.CL]
	(or arXiv:2407.08400v1 [cs.CL] for this version)

Submission history

From: Marek Kadlčík [view email]
[v1] Thu, 11 Jul 2024 11:06:05 UTC (165 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-07

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Self-training Language Models for Arithmetic Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Self-training Language Models for Arithmetic Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators