Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

Bhandari, Manik; Gour, Pranav; Ashfaq, Atabak; Liu, Pengfei

Computer Science > Computation and Language

arXiv:2011.04096 (cs)

[Submitted on 8 Nov 2020]

Title:Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

Authors:Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu

View PDF

Abstract:In text summarization, evaluating the efficacy of automatic metrics without human judgments has become recently popular. One exemplar work concludes that automatic metrics strongly disagree when ranking high-scoring summaries. In this paper, we revisit their experiments and find that their observations stem from the fact that metrics disagree in ranking summaries from any narrow scoring range. We hypothesize that this may be because summaries are similar to each other in a narrow scoring range and are thus, difficult to rank. Apart from the width of the scoring range of summaries, we analyze three other properties that impact inter-metric agreement - Ease of Summarization, Abstractiveness, and Coverage. To encourage reproducible research, we make all our analysis code and data publicly available.

Comments:	Accepted at COLING 2020
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2011.04096 [cs.CL]
	(or arXiv:2011.04096v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2011.04096

Submission history

From: Manik Bhandari [view email]
[v1] Sun, 8 Nov 2020 22:26:06 UTC (2,886 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2020-11

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Manik Bhandari
Pengfei Liu

export BibTeX citation

Computer Science > Computation and Language

Title:Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators