Improved statistical machine translation using monolingual paraphrases

Nakov, Preslav

Computer Science > Computation and Language

arXiv:2109.15119 (cs)

[Submitted on 25 Sep 2021]

Title:Improved statistical machine translation using monolingual paraphrases

Authors:Preslav Nakov

View PDF

Abstract:We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems "for free" -- by creating it from data that is already available rather than having to create more aligned data. Starting with a syntactic tree, we recursively generate new sentence variants where noun compounds are paraphrased using suitable prepositions, and vice-versa -- preposition-containing noun phrases are turned into noun compounds. The evaluation shows an improvement equivalent to 33%-50% of that of doubling the amount of training data.

Comments:	machine translation, SMT, paraphrasing, data augmentation. arXiv admin note: substantial text overlap with arXiv:1912.01113
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
MSC classes:	68T50
ACM classes:	F.2.2; I.2.7
Cite as:	arXiv:2109.15119 [cs.CL]
	(or arXiv:2109.15119v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.15119
Journal reference:	ECAI-2008

Submission history

From: Preslav Nakov [view email]
[v1] Sat, 25 Sep 2021 16:29:47 UTC (70 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Preslav Nakov

export BibTeX citation

Computer Science > Computation and Language

Title:Improved statistical machine translation using monolingual paraphrases

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improved statistical machine translation using monolingual paraphrases

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators