Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
Re-evaluating the Role of B leu in Machine Translation Research
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
citing papers explorer
-
Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations
Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
- Lessons from the Trenches on Reproducible Evaluation of Language Models