Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Syu, Shen-sian; Xie, Juncheng; Lee, Hung-yi

doi:10.1109/TASLP.2024.3451977.

Computer Science > Computation and Language

arXiv:2306.06345 (cs)

[Submitted on 10 Jun 2023 (v1), last revised 14 Oct 2024 (this version, v3)]

Title:Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Authors:Shen-sian Syu, Juncheng Xie, Hung-yi Lee

View PDF HTML (experimental)

Abstract:Non-autoregressive approaches aim to improve the inference speed of translation models, particularly those that generate output in a one-pass forward manner. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models. This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models while maintaining a substantial acceleration in inference speed. We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively. Furthermore, we adopt the MASK insertion scheme for up-sampling instead of token duplication, and we present an embedding distillation method to further enhance performance. In our experiments, our model outperforms the baseline autoregressive model (Transformer \textit{base}) on multiple datasets, including WMT'14 DE$\leftrightarrow$EN, WMT'16 RO$\leftrightarrow$EN, and IWSLT'14 DE$\leftrightarrow$EN. Notably, our model achieves better performance than the baseline autoregressive model on the IWSLT'14 En$\leftrightarrow$De and WMT'16 En$\leftrightarrow$Ro datasets, even without using distillation data during training. It is worth highlighting that on the IWSLT'14 DE$\rightarrow$EN dataset, our model achieves an impressive BLEU score of 39.59, setting a new state-of-the-art performance. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.

Comments:	12 pages, 6 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.06345 [cs.CL]
	(or arXiv:2306.06345v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.06345
Journal reference:	IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 32, 2024
Related DOI:	https://doi.org/10.1109/TASLP.2024.3451977.

Submission history

From: Shen-Sian Syu [view email]
[v1] Sat, 10 Jun 2023 05:24:29 UTC (7,315 KB)
[v2] Thu, 31 Aug 2023 03:14:47 UTC (7,462 KB)
[v3] Mon, 14 Oct 2024 05:50:13 UTC (7,468 KB)

Computer Science > Computation and Language

Title:Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators