Revisiting Checkpoint Averaging for Neural Machine Translation

Gao, Yingbo; Herold, Christian; Yang, Zijian; Ney, Hermann

Computer Science > Computation and Language

arXiv:2210.11803 (cs)

[Submitted on 21 Oct 2022]

Title:Revisiting Checkpoint Averaging for Neural Machine Translation

Authors:Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney

View PDF

Abstract:Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models. The calculation is cheap to perform and the fact that the translation improvement almost comes for free, makes it widely adopted in neural machine translation research. Despite the popularity, the method itself simply takes the mean of the model parameters from several checkpoints, the selection of which is mostly based on empirical recipes without many justifications. In this work, we revisit the concept of checkpoint averaging and consider several extensions. Specifically, we experiment with ideas such as using different checkpoint selection strategies, calculating weighted average instead of simple mean, making use of gradient information and fine-tuning the interpolation weights on development data. Our results confirm the necessity of applying checkpoint averaging for optimal performance, but also suggest that the landscape between the converged checkpoints is rather flat and not much further improvement compared to simple averaging is to be obtained.

Comments:	accepted at AACL2022
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2210.11803 [cs.CL]
	(or arXiv:2210.11803v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.11803

Submission history

From: Yingbo Gao [view email]
[v1] Fri, 21 Oct 2022 08:29:23 UTC (1,390 KB)

Computer Science > Computation and Language

Title:Revisiting Checkpoint Averaging for Neural Machine Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revisiting Checkpoint Averaging for Neural Machine Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators