Simple is not Enough: Document-level Text Simplification using Readability and Coherence

Vásquez-Rodríguez, Laura; Nguyen, Nhung T. H.; Przybyła, Piotr; Shardlow, Matthew; Ananiadou, Sophia

Computer Science > Computation and Language

arXiv:2412.18655 (cs)

[Submitted on 24 Dec 2024]

Title:Simple is not Enough: Document-level Text Simplification using Readability and Coherence

Authors:Laura Vásquez-Rodríguez, Nhung T.H. Nguyen, Piotr Przybyła, Matthew Shardlow, Sophia Ananiadou

View PDF HTML (experimental)

Abstract:In this paper, we present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence. In the past decade, the progress of the Text Simplification (TS) field has been mostly shown at a sentence level, rather than considering paragraphs or documents, a setting from which most TS audiences would benefit. We propose a simplification system that is initially fine-tuned with professionally created corpora. Further, we include multiple objectives during training, considering simplicity, readability, and coherence altogether. Our contributions include the extension of professionally annotated simplification corpora by the association of existing annotations into (complex text, simple text, readability label) triples to benefit from readability during training. Also, we present a comparative analysis in which we evaluate our proposed models in a zero-shot, few-shot, and fine-tuning setting using document-level TS corpora, demonstrating novel methods for simplification. Finally, we show a detailed analysis of outputs, highlighting the difficulties of simplification at a document level.

Comments:	16 pages, 3 figures, 8 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.18655 [cs.CL]
	(or arXiv:2412.18655v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.18655

Submission history

From: Laura Vásquez-Rodríguez [view email]
[v1] Tue, 24 Dec 2024 19:05:21 UTC (454 KB)

Computer Science > Computation and Language

Title:Simple is not Enough: Document-level Text Simplification using Readability and Coherence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Simple is not Enough: Document-level Text Simplification using Readability and Coherence

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators