Improving Neural Text Simplification Model with Simplified Corpora

Qiang, Jipeng

Computer Science > Computation and Language

arXiv:1810.04428 (cs)

[Submitted on 10 Oct 2018]

Title:Improving Neural Text Simplification Model with Simplified Corpora

Authors:Jipeng Qiang

View PDF

Abstract:Text simplification (TS) can be viewed as monolingual translation task, translating between text variations within a single language. Recent neural TS models draw on insights from neural machine translation to learn lexical simplification and content reduction using encoder-decoder model. But different from neural machine translation, we cannot obtain enough ordinary and simplified sentence pairs for TS, which are expensive and time-consuming to build. Target-side simplified sentences plays an important role in boosting fluency for statistical TS, and we investigate the use of simplified sentences to train, with no changes to the network architecture. We propose to pair simple training sentence with a synthetic ordinary sentence via back-translation, and treating this synthetic data as additional training data. We train encoder-decoder model using synthetic sentence pairs and original sentence pairs, which can obtain substantial improvements on the available WikiLarge data and WikiSmall data compared with the state-of-the-art methods.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1810.04428 [cs.CL]
	(or arXiv:1810.04428v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1810.04428

Submission history

From: Jipeng Qiang [view email]
[v1] Wed, 10 Oct 2018 09:14:06 UTC (92 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jipeng Qiang

export BibTeX citation

Computer Science > Computation and Language

Title:Improving Neural Text Simplification Model with Simplified Corpora

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Neural Text Simplification Model with Simplified Corpora

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators