A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Hu, Xinyu; Gao, Mingqi; Lin, Li; Yu, Zhenghan; Wan, Xiaojun

Computer Science > Computation and Language

arXiv:2502.12052 (cs)

[Submitted on 17 Feb 2025]

Title:A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Authors:Xinyu Hu, Mingqi Gao, Li Lin, Zhenghan Yu, Xiaojun Wan

View PDF HTML (experimental)

Abstract:In NLG meta-evaluation, evaluation metrics are typically assessed based on their consistency with humans. However, we identify some limitations in traditional NLG meta-evaluation approaches, such as issues in handling human ratings and ambiguous selections of correlation measures, which undermine the effectiveness of meta-evaluation. In this work, we propose a dual-perspective NLG meta-evaluation framework that focuses on different evaluation capabilities, thereby providing better interpretability. In addition, we introduce a method of automatically constructing the corresponding benchmarks without requiring new human annotations. Furthermore, we conduct experiments with 16 representative LLMs as the evaluators based on our proposed framework, comprehensively analyzing their evaluation performance from different perspectives.

Comments:	23 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.12052 [cs.CL]
	(or arXiv:2502.12052v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.12052

Submission history

From: Xinyu Hu [view email]
[v1] Mon, 17 Feb 2025 17:22:49 UTC (144 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2025-02

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators