A new parallel reasoning dataset enables LLMs to shift reasoning to non-English languages via SFT and RLVR while matching or exceeding baseline performance.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Treating language as a latent variable via polyGRPO RL improves Qwen2.5-7B-Instruct by 6.72% on English reasoning benchmarks and 6.89% on multilingual ones, with cross-task gains on commonsense reasoning from math-only training.
citing papers explorer
-
ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance
A new parallel reasoning dataset enables LLMs to shift reasoning to non-English languages via SFT and RLVR while matching or exceeding baseline performance.
-
Language as a Latent Variable for Reasoning Optimization
Treating language as a latent variable via polyGRPO RL improves Qwen2.5-7B-Instruct by 6.72% on English reasoning benchmarks and 6.89% on multilingual ones, with cross-task gains on commonsense reasoning from math-only training.