Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Tan, Shaomu; Monz, Christof

Computer Science > Computation and Language

arXiv:2504.13630 (cs)

[Submitted on 18 Apr 2025]

Title:Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Authors:Shaomu Tan, Christof Monz

View PDF HTML (experimental)

Abstract:A key challenge in MT evaluation is the inherent noise and inconsistency of human ratings. Regression-based neural metrics struggle with this noise, while prompting LLMs shows promise at system-level evaluation but performs poorly at segment level. In this work, we propose ReMedy, a novel MT metric framework that reformulates translation evaluation as a reward modeling task. Instead of regressing on imperfect human ratings directly, ReMedy learns relative translation quality using pairwise preference data, resulting in a more reliable evaluation. In extensive experiments across WMT22-24 shared tasks (39 language pairs, 111 MT systems), ReMedy achieves state-of-the-art performance at both segment- and system-level evaluation. Specifically, ReMedy-9B surpasses larger WMT winners and massive closed LLMs such as MetricX-13B, XCOMET-Ensemble, GEMBA-GPT-4, PaLM-540B, and finetuned PaLM2. Further analyses demonstrate that ReMedy delivers superior capability in detecting translation errors and evaluating low-quality translations.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2504.13630 [cs.CL]
	(or arXiv:2504.13630v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.13630

Submission history

From: Shaomu Tan [view email]
[v1] Fri, 18 Apr 2025 11:11:14 UTC (1,242 KB)

Computer Science > Computation and Language

Title:Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators