The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Csordás, Róbert; Irie, Kazuki; Schmidhuber, Jürgen

Computer Science > Machine Learning

arXiv:2108.12284 (cs)

[Submitted on 26 Aug 2021 (v1), last revised 14 Feb 2022 (this version, v4)]

Title:The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Authors:Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

View PDF

Abstract:Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results.

Comments:	Accepted to EMNLP 2021
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2108.12284 [cs.LG]
	(or arXiv:2108.12284v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2108.12284

Submission history

From: Róbert Csordás [view email]
[v1] Thu, 26 Aug 2021 17:26:56 UTC (715 KB)
[v2] Mon, 6 Sep 2021 08:45:06 UTC (715 KB)
[v3] Tue, 19 Oct 2021 13:17:39 UTC (715 KB)
[v4] Mon, 14 Feb 2022 10:16:49 UTC (715 KB)

Computer Science > Machine Learning

Title:The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators