Mt-r1-zero: Advancing llm-based machine translation via r1-zero-like reinforcement learning

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning , author= · 2025 · arXiv 2504.10160

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

cs.AI · 2025-05-25 · unverdicted · novelty 7.0

UniR is a composable reasoning module trained with verifiable rewards and added to frozen LLMs via logit summation, enabling modular composition and weak-to-strong generalization across tasks and model sizes.

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

RL-trained lightweight controller using answer statistics improves trade-offs among correctness, latency, and total samples in adaptive sampling for LLM test-time scaling.

citing papers explorer

Showing 3 of 3 citing papers.

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation cs.CL · 2026-04-21 · unverdicted · none · ref 50
ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs cs.AI · 2025-05-25 · unverdicted · none · ref 11
UniR is a composable reasoning module trained with verifiable rewards and added to frozen LLMs via logit summation, enabling modular composition and weak-to-strong generalization across tasks and model sizes.
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling cs.CL · 2026-06-02 · unverdicted · none · ref 7
RL-trained lightweight controller using answer statistics improves trade-offs among correctness, latency, and total samples in adaptive sampling for LLM test-time scaling.

Mt-r1-zero: Advancing llm-based machine translation via r1-zero-like reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer