On the Effectiveness of Automated Metrics for Text Generation Systems

von Däniken, Pius; Deriu, Jan; Tuggener, Don; Cieliebak, Mark

Computer Science > Computation and Language

arXiv:2210.13025 (cs)

[Submitted on 24 Oct 2022]

Title:On the Effectiveness of Automated Metrics for Text Generation Systems

Authors:Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak

View PDF

Abstract:A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we propose a first step towards such a theory that incorporates different sources of uncertainty, such as imperfect automated metrics and insufficiently sized test sets. The theory has practical applications, such as determining the number of samples needed to reliably distinguish the performance of a set of Text Generation systems in a given setting. We showcase the application of the theory on the WMT 21 and Spot-The-Bot evaluation data and outline how it can be leveraged to improve the evaluation protocol regarding the reliability, robustness, and significance of the evaluation outcome.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.13025 [cs.CL]
	(or arXiv:2210.13025v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.13025

Submission history

From: Jan Deriu [view email]
[v1] Mon, 24 Oct 2022 08:15:28 UTC (438 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2022-10

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:On the Effectiveness of Automated Metrics for Text Generation Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Effectiveness of Automated Metrics for Text Generation Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators