Small but Mighty: New Benchmarks for Split and Rephrase

Zhang, Li; Zhu, Huaiyu; Brahma, Siddhartha; Li, Yunyao

doi:10.18653/v1/2020.emnlp-main.91

Computer Science > Computation and Language

arXiv:2009.08560 (cs)

[Submitted on 17 Sep 2020 (v1), last revised 12 Dec 2020 (this version, v2)]

Title:Small but Mighty: New Benchmarks for Split and Rephrase

Authors:Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li

View PDF

Abstract:Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.

Comments:	In EMNLP 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2009.08560 [cs.CL]
	(or arXiv:2009.08560v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2009.08560
Journal reference:	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020) 1198-1205
Related DOI:	https://doi.org/10.18653/v1/2020.emnlp-main.91

Submission history

From: Li Zhang [view email]
[v1] Thu, 17 Sep 2020 23:37:33 UTC (73 KB)
[v2] Sat, 12 Dec 2020 15:35:32 UTC (73 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Li Zhang
Huaiyu Zhu
Siddhartha Brahma
Yunyao Li

export BibTeX citation

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:Small but Mighty: New Benchmarks for Split and Rephrase

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:Small but Mighty: New Benchmarks for Split and Rephrase

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators