WeTS: A Benchmark for Translation Suggestion

Yang, Zhen; Meng, Fandong; Zhang, Yingxue; Li, Ernan; Zhou, Jie

Computer Science > Computation and Language

arXiv:2110.05151 (cs)

[Submitted on 11 Oct 2021 (v1), last revised 11 Oct 2022 (this version, v3)]

Title:WeTS: A Benchmark for Translation Suggestion

Authors:Zhen Yang, Fandong Meng, Yingxue Zhang, Ernan Li, Jie Zhou

View PDF

Abstract:Translation Suggestion (TS), which provides alternatives for specific words or phrases given the entire documents translated by machine translation (MT) \cite{lee2021intellicat}, has been proven to play a significant role in post editing (PE). However, there is still no publicly available data set to support in-depth research for this problem, and no reproducible experimental results can be followed by researchers in this community. To break this limitation, we create a benchmark data set for TS, called \emph{WeTS}, which contains golden corpus annotated by expert translators on four translation directions. Apart from the human-annotated golden corpus, we also propose several novel methods to generate synthetic corpus which can substantially improve the performance of TS. With the corpus we construct, we introduce the Transformer-based model for TS, and experimental results show that our model achieves State-Of-The-Art (SOTA) results on all four translation directions, including English-to-German, German-to-English, Chinese-to-English and English-to-Chinese. Codes and corpus can be found at this https URL.

Comments:	Translation suggestion, Transformer, EMNLP2022 main conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.05151 [cs.CL]
	(or arXiv:2110.05151v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.05151

Submission history

From: Zhen Yang [view email]
[v1] Mon, 11 Oct 2021 10:52:17 UTC (228 KB)
[v2] Mon, 14 Mar 2022 12:05:09 UTC (303 KB)
[v3] Tue, 11 Oct 2022 02:42:49 UTC (450 KB)

Computer Science > Computation and Language

Title:WeTS: A Benchmark for Translation Suggestion

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:WeTS: A Benchmark for Translation Suggestion

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators