Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom, Federmann, Christian · 2023

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

representative citing papers

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

cs.CL · 2024-04-29 · conditional · novelty 7.0

A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.

Who Watches the Watchmen? Humans Disagree With Translation Metrics on Unseen Domains

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

Automatic translation metrics show lower agreement with humans on unseen technical domains than humans show with each other, and their robustness claims weaken when benchmarked against inter-annotator agreement instead of raw scores.

CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs

cs.CL · 2026-05-15 · unverdicted · novelty 5.0

Small open-source LLMs achieve competitive system-level correlations with human judgments in machine translation quality estimation, outperforming traditional neural metrics and fine-tuned models via single-pass multi-output prompting.

MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

MAPLE uses meta-learning with prototypical networks to learn transferable representations and achieves state-of-the-art cross-prompt essay scoring on ELLIPSE, LAILA, and parts of ASAP datasets.

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

cs.CL · 2026-05-21

Smarter edits? Post-editing with error highlights and translation suggestions

cs.CL · 2026-05-20

citing papers explorer

Showing 7 of 7 citing papers.

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation cs.CL · 2026-04-21 · unverdicted · none · ref 54
ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models cs.CL · 2024-04-29 · conditional · none · ref 48
A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.
Who Watches the Watchmen? Humans Disagree With Translation Metrics on Unseen Domains cs.CL · 2026-04-19 · unverdicted · none · ref 24
Automatic translation metrics show lower agreement with humans on unseen technical domains than humans show with each other, and their robustness claims weaken when benchmarked against inter-annotator agreement instead of raw scores.
CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs cs.CL · 2026-05-15 · unverdicted · none · ref 28
Small open-source LLMs achieve competitive system-level correlations with human judgments in machine translation quality estimation, outperforming traditional neural metrics and fine-tuned models via single-pass multi-output prompting.
MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring cs.CL · 2026-04-19 · unverdicted · none · ref 25
MAPLE uses meta-learning with prototypical networks to learn transferable representations and achieves state-of-the-art cross-prompt essay scoring on ELLIPSE, LAILA, and parts of ASAP datasets.
Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild cs.CL · 2026-05-21 · unreviewed · ref 53
Smarter edits? Post-editing with error highlights and translation suggestions cs.CL · 2026-05-20 · unreviewed · ref 5

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

fields

years

verdicts

representative citing papers

citing papers explorer