Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Soltan, Saleh; Rosenbaum, Andy; Falke, Tobias; Lu, Qin; Rumshisky, Anna; Hamza, Wael

Computer Science > Computation and Language

arXiv:2306.08756 (cs)

[Submitted on 14 Jun 2023]

Title:Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Authors:Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza

View PDF

Abstract:Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on sequence labeling tasks. Variations of masking during seq2seq training, reducing the decoder size, and continuing with a small amount of MLM training do not close the gap. (2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model. Overall, this two-stage approach is an efficient recipe to obtain both a multilingual encoder and a seq2seq model, matching the performance of training each model from scratch while reducing the total compute cost by 27%.

Comments:	ACL Findings 2023 and SustaiNLP Workshop 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2306.08756 [cs.CL]
	(or arXiv:2306.08756v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.08756

Submission history

From: Andy Rosenbaum [view email]
[v1] Wed, 14 Jun 2023 21:41:52 UTC (7,092 KB)

Computer Science > Computation and Language

Title:Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators