Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

Vu, Doan Nam Long; Moosavi, Nafise Sadat; Eger, Steffen

Computer Science > Computation and Language

arXiv:2209.02317 (cs)

[Submitted on 6 Sep 2022 (v1), last revised 7 Sep 2022 (this version, v2)]

Title:Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

Authors:Doan Nam Long Vu, Nafise Sadat Moosavi, Steffen Eger

View PDF

Abstract:The evaluation of recent embedding-based evaluation metrics for text generation is primarily based on measuring their correlation with human evaluations on standard benchmarks. However, these benchmarks are mostly from similar domains to those used for pretraining word embeddings. This raises concerns about the (lack of) generalization of embedding-based metrics to new and noisy domains that contain a different vocabulary than the pretraining data. In this paper, we examine the robustness of BERTScore, one of the most popular embedding-based metrics for text generation. We show that (a) an embedding-based metric that has the highest correlation with human evaluations on a standard benchmark can have the lowest correlation if the amount of input noise or unknown tokens increases, (b) taking embeddings from the first layer of pretrained models improves the robustness of all metrics, and (c) the highest robustness is achieved when using character-level embeddings, instead of token-based embeddings, from the first layer of the pretrained model.

Comments:	COLING 2022 camera-ready version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2209.02317 [cs.CL]
	(or arXiv:2209.02317v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.02317

Submission history

From: Doan Nam Long Vu [view email]
[v1] Tue, 6 Sep 2022 09:10:54 UTC (1,163 KB)
[v2] Wed, 7 Sep 2022 08:08:28 UTC (1,163 KB)

Computer Science > Computation and Language

Title:Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators