A new parallel reasoning dataset enables LLMs to shift reasoning to non-English languages via SFT and RLVR while matching or exceeding baseline performance.
german": {
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 5years
2026 5verdicts
UNVERDICTED 5representative citing papers
ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.
Luar is a reinforcement learning method enabling reasoning language models to decide when to invoke English translation for improved multilingual reasoning.
DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.
Treating language as a latent variable via polyGRPO RL improves Qwen2.5-7B-Instruct by 6.72% on English reasoning benchmarks and 6.89% on multilingual ones, with cross-task gains on commonsense reasoning from math-only training.
citing papers explorer
-
ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance
A new parallel reasoning dataset enables LLMs to shift reasoning to non-English languages via SFT and RLVR while matching or exceeding baseline performance.
-
Enhancing Multilingual Reasoning via Steerable Model Merging
ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.
-
Learning When to Translate for Multilingual Reasoning
Luar is a reinforcement learning method enabling reasoning language models to decide when to invoke English translation for improved multilingual reasoning.
-
DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer
DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.
-
Language as a Latent Variable for Reasoning Optimization
Treating language as a latent variable via polyGRPO RL improves Qwen2.5-7B-Instruct by 6.72% on English reasoning benchmarks and 6.89% on multilingual ones, with cross-task gains on commonsense reasoning from math-only training.