Round-trip translation evaluation shows that existing multilingual benchmarks measure reasoning and recall instead of language skills, with the new LiT benchmark correlating at rho=0.94 to LMArena ratings.
Six Challenges for Neural Machine Translation , url =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Uncertainty-based gating for few-shot reranking reduces compute by 15-80% and improves performance by up to 2% across 8 LLMs, 7 NLU datasets, and 9 MT settings.
citing papers explorer
-
Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss
Round-trip translation evaluation shows that existing multilingual benchmarks measure reasoning and recall instead of language skills, with the new LiT benchmark correlating at rho=0.94 to LMArena ratings.
-
When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking
Uncertainty-based gating for few-shot reranking reduces compute by 15-80% and improves performance by up to 2% across 8 LLMs, 7 NLU datasets, and 9 MT settings.