Self-Repetition in Abstractive Neural Summarizers

Salkar, Nikita; Trikalinos, Thomas; Wallace, Byron C.; Nenkova, Ani

Computer Science > Computation and Language

arXiv:2210.08145 (cs)

[Submitted on 14 Oct 2022]

Title:Self-Repetition in Abstractive Neural Summarizers

Authors:Nikita Salkar, Thomas Trikalinos, Byron C. Wallace, Ani Nenkova

View PDF

Abstract:We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5, and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three architectures have different propensities for repeating content across output summaries for inputs, with BART being particularly prone to self-repetition. Fine-tuning on more abstractive data, and on data featuring formulaic language, is associated with a higher rate of self-repetition. In qualitative analysis we find systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. Our approach to corpus-level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.08145 [cs.CL]
	(or arXiv:2210.08145v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.08145

Submission history

From: Nikita Salkar [view email]
[v1] Fri, 14 Oct 2022 23:50:42 UTC (50 KB)

Computer Science > Computation and Language

Title:Self-Repetition in Abstractive Neural Summarizers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Self-Repetition in Abstractive Neural Summarizers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators