Learning When to Translate for Multilingual Reasoning

Deokhyung Kang; Gary Geunbae Lee; Hyounghun Kim

arxiv: 2606.02465 · v1 · pith:PWHN7U23new · submitted 2026-06-01 · 💻 cs.CL · cs.AI

Learning When to Translate for Multilingual Reasoning

Deokhyung Kang , Hyounghun Kim , Gary Geunbae Lee This is my paper

Pith reviewed 2026-06-28 14:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords multilingual reasoningreinforcement learningselective translationlanguage understandinglow-resource languagesreasoning modelsGRPO

0 comments

The pith

Reinforcement learning trains reasoning models to translate non-English inputs only when direct understanding is unreliable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Reasoning language models often fail on non-English inputs because of language understanding gaps rather than reasoning deficits. Translating every input to English fixes some failures but wastes effort when the original query is already interpretable. Luar applies reinforcement learning so the model learns to invoke translation only when the translated version is expected to produce substantially better answers than direct reasoning. The resulting policy improves accuracy over standard training methods and shows the largest gains on low-resource languages while generalizing its translation decisions to languages never seen during training.

Core claim

Luar trains the model to choose between solving the original input directly and reasoning over its English translation, with a reward that encourages translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. On multilingual reasoning benchmarks the method outperforms GRPO and other training-based baselines, with particularly large gains on low-resource languages. Analysis shows the model avoids unnecessary translations when direct reasoning suffices and extends its translator-call behavior to unseen low-resource languages.

What carries the argument

Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that rewards the model for selecting translation only when it yields a large expected performance gain over direct reasoning.

If this is right

The trained model avoids unnecessary translation calls when direct reasoning from the original input is reliable.
Accuracy improvements are largest on low-resource languages where understanding failures are most common.
Translator invocation behavior learned on seen languages transfers to unseen low-resource languages.
Overall performance exceeds standard GRPO training and other baselines across multilingual reasoning benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Selective translation could reduce average inference cost by skipping translation for inputs the model already understands.
The same boundary-learning approach might be applied to deciding when to use other input transformations or external tools during reasoning.
If the learned decision boundary proves stable across domains, it could serve as a general mechanism for models to request clarification or augmentation only when needed.

Load-bearing premise

Reinforcement learning can reliably train the model to estimate when translator-augmented reasoning will substantially outperform direct reasoning from the original input.

What would settle it

A controlled experiment showing that Luar-trained models either translate at the same rate as an always-translate baseline or produce lower accuracy than direct-reasoning baselines on low-resource languages would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.02465 by Deokhyung Kang, Gary Geunbae Lee, Hyounghun Kim.

**Figure 1.** Figure 1: Explicit uncertainty rates of RLMs on PolyMath-Low. We report the percentage of queryunderstanding failures where the model explicitly expresses uncertainty about the meaning of the query, measured separately in reasoning traces and final responses. Evaluation details are provided in Appendix A. reasoning performance gaps (Huang et al., 2025; Wang et al., 2025). In particular, they perform much better o… view at source ↗

**Figure 2.** Figure 2: Translator-call prefix prepended to the reason [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of LUAR. We first derive an outcome-based translation-usefulness label u(q) by comparing direct and translator-augmented reasoning outcomes. Stage I uses supervised fine-tuning to warm up both reasoning trajectories, and Stage II applies GRPO with a boundary-aware reward to learn when to invoke translation. the model learn when translation is useful without confounding from translation quality. Fo… view at source ↗

**Figure 4.** Figure 4: Translator call rate across languages and datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Validation translator-call rate of ablation meth [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Mode-selection macro-F1 under different [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Mode-selection macro-F1 under different boundary thresholds on Qwen3-8B. tains strong mode-selection performance across these alternative boundary definitions. In most settings, LUAR achieves the best or highly competitive macro-F1 compared with prompt-based, external detector-based, and SFT-based baselines. The advantage is especially clear under stricter thresholds, where positive examples correspond t… view at source ↗

**Figure 8.** Figure 8: Precision and recall for translator-call decisions on Qwen3-4B. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Precision and recall for translator-call decisions on Qwen3-8B. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

read the original abstract

Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the model can reason reliably from the original query. To address this challenge, we propose Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains RLMs to selectively invoke translation when direct understanding is unreliable. Luar trains the model to choose between solving the original input directly and reasoning over its English translation, encouraging translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar outperforms standard GRPO and other training-based baselines, with particularly large gains on low-resource languages. Further analysis shows that Luar avoids unnecessary translation in cases where direct reasoning is sufficient, while extending its translator-call behavior to unseen low-resource languages. Together, our work suggests a selective approach to multilingual reasoning: RLMs can learn to invoke translation only when their direct understanding is unreliable. The project will be made publicly available at https://github.com/deokhk/LUAR

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Luar applies RL to let models decide when translation helps reasoning, with reported gains on low-resource languages, but the abstract leaves the training details and robustness unclear.

read the letter

Luar trains reasoning models with reinforcement learning to pick between direct reasoning on the original input and reasoning on its English translation. The policy is rewarded for final answer accuracy while being pushed to skip translation when it is not needed. This is the central new piece: a learned boundary instead of a fixed rule.

The paper shows the expected pattern on benchmarks, with bigger lifts on low-resource languages and some carry-over to languages not seen in training. The analysis that the model actually refrains from translating easy cases is consistent with the stated goal and gives a concrete check on whether the policy is doing something non-trivial.

The comparisons to GRPO and other training baselines are the right place to start. If the full paper includes ablations on the reward terms and shows the policy decisions are stable across runs, that would be the main support for the claim.

The soft spots are around the RL setup itself. Training a policy that must estimate whether translation will help is sensitive to how the reward is shaped and to translator quality; small changes there can flip the behavior. The abstract does not spell out the exact objective or the cost term, so the paper needs to demonstrate that the gains are not an artifact of a particular reward or benchmark split. Generalization to unseen languages is promising but still rests on a limited set of test cases.

This is for groups working on multilingual reasoning or on tool-use policies inside LLMs. The idea is concrete enough and the evaluation direction is clear, so it should go to referees rather than being desk-rejected.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains reasoning language models to selectively invoke English translation for non-English inputs only when direct reasoning from the original input is unreliable. The approach uses a reward based on final answer correctness while discouraging unnecessary translations. It claims that Luar outperforms standard GRPO and other training-based baselines on multilingual reasoning benchmarks, with particularly large gains on low-resource languages, and that the learned behavior generalizes to unseen low-resource languages while avoiding unneeded translations.

Significance. If the empirical results hold, the work offers a practical method for improving multilingual reasoning in RLMs through learned selective translation, which could reduce unnecessary computation and close performance gaps especially for low-resource languages. The RL formulation that rewards correctness while penalizing excess translations is a coherent way to learn the decision boundary. The stated commitment to public release of the project at the provided GitHub link supports reproducibility and potential follow-up work.

minor comments (2)

Abstract: The claim of outperformance is stated without any quantitative results, dataset names, or baseline scores, making it difficult to gauge the magnitude of gains even at a high level.
The description of the training objective mentions a reward for correctness and a penalty for unnecessary calls, but does not specify the exact form of the combined reward or how the 'substantially outperform' threshold is operationalized.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The referee accurately captures the core idea of Luar as a selective translation approach via reinforcement learning. No major comments were provided in the report, so we have no specific points requiring rebuttal or revision.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes Luar, an RL framework that trains a policy to decide between direct reasoning and translation-augmented reasoning, with rewards based on final answer correctness. The abstract and provided description contain no equations, fitted parameters presented as predictions, self-citations used as load-bearing uniqueness theorems, or any reduction where a claimed result is definitionally equivalent to its inputs. The method is a standard selective RL setup whose claimed generalization and efficiency gains are presented as empirical outcomes rather than constructed by the training objective itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5746 in / 886 out tokens · 29980 ms · 2026-06-28T14:24:19.805277+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 17 canonical work pages

[1]

2026 , eprint=

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models? , author=. 2026 , eprint=

2026
[2]

Question Translation Training for Better Multilingual Reasoning

Zhu, Wenhao and Huang, Shujian and Yuan, Fei and She, Shuaijie and Chen, Jiajun and Birch, Alexandra. Question Translation Training for Better Multilingual Reasoning. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.498

work page doi:10.18653/v1/2024.findings-acl.498 2024
[3]

2025 , eprint=

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap , author=. 2025 , eprint=

2025
[4]

2024 , eprint=

xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning , author=. 2024 , eprint=

2024
[5]

MAPO : Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

She, Shuaijie and Zou, Wei and Huang, Shujian and Zhu, Wenhao and Liu, Xiang and Geng, Xiang and Chen, Jiajun. MAPO : Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.539

work page doi:10.18653/v1/2024.acl-long.539 2024
[6]

B ench MAX : A Comprehensive Multilingual Evaluation Suite for Large Language Models

Huang, Xu and Zhu, Wenhao and Hu, Hanxu and He, Conghui and Li, Lei and Huang, Shujian and Yuan, Fei. B ench MAX : A Comprehensive Multilingual Evaluation Suite for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.909

work page doi:10.18653/v1/2025.findings-emnlp.909 2025
[7]

The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=
[8]

2025 , journal=

Crosslingual Reasoning through Test-Time Scaling , author=. 2025 , journal=

2025
[9]

2026 , eprint=

Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models , author=. 2026 , eprint=

2026
[10]

2025 , journal=

Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models? , author=. 2025 , journal=

2025
[11]

2026 , eprint=

What Makes Good Multilingual Reasoning? Disentangling Reasoning Traces with Measurable Features , author=. 2026 , eprint=

2026
[12]

When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

Qi, Jirui and Chen, Shan and Xiong, Zidi and Fern \'a ndez, Raquel and Bitterman, Danielle and Bisazza, Arianna. When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.1103

work page doi:10.18653/v1/2025.findings-emnlp.1103 2025
[13]

The Fourteenth International Conference on Learning Representations , year=

Long Chain-of-Thought Reasoning Across Languages , author=. The Fourteenth International Conference on Learning Representations , year=
[14]

2026 , eprint=

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training , author=. 2026 , eprint=

2026
[15]

2024 , eprint=

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers , author=. 2024 , eprint=

2024
[16]

MindMerger: Efficiently Boosting

Zixian Huang and Wenhao Zhu and Gong Cheng and Lei Li and Fei Yuan , booktitle=. MindMerger: Efficiently Boosting. 2024 , url=

2024
[17]

L ang B ridge: Multilingual Reasoning Without Multilingual Supervision

Yoon, Dongkeun and Jang, Joel and Kim, Sungdong and Kim, Seungone and Shafayat, Sheikh and Seo, Minjoon. L ang B ridge: Multilingual Reasoning Without Multilingual Supervision. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.405

work page doi:10.18653/v1/2024.acl-long.405 2024
[18]

A Survey of Multilingual Reasoning in Language Models

Ghosh, Akash and Datta, Debayan and Saha, Sriparna and Agarwal, Chirag. A Survey of Multilingual Reasoning in Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.474

work page doi:10.18653/v1/2025.findings-emnlp.474 2025
[19]

MMLU - P ro X : A Multilingual Benchmark for Advanced Large Language Model Evaluation

Xuan, Weihao and Yang, Rui and Qi, Heli and Zeng, Qingcheng and Xiao, Yunze and Feng, Aosong and Liu, Dairui and Xing, Yun and Wang, Junjue and Gao, Fan and Lu, Jinghui and Jiang, Yuang and Li, Huitao and Li, Xin and Yu, Kunyu and Dong, Ruihai and Gu, Shangding and Li, Yuekang and Xie, Xiaofei and Juefei-Xu, Felix and Khomh, Foutse and Yoshie, Osamu and C...

work page doi:10.18653/v1/2025.emnlp-main.79 2025
[20]

2024 , url=

Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen , booktitle=. 2024 , url=

2024
[21]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page doi:10.1038/s41586-025-09422-z
[22]

2026 , eprint=

Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning , author=. 2026 , eprint=

2026
[23]

2026 , eprint=

OpenAI o1 System Card , author=. 2026 , eprint=

2026
[24]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025
[25]

2025 , eprint=

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use , author=. 2025 , eprint=

2025
[26]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024
[27]

Kydlíček, Hynek , license =
[28]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024
[29]

2025 , eprint=

DAPO: An Open-Source LLM Reinforcement Learning System at Scale , author=. 2025 , eprint=

2025
[30]

2025 , howpublished =

OpenAI , title =. 2025 , howpublished =

2025
[31]

The Eleventh International Conference on Learning Representations , year=

Language models are multilingual chain-of-thought reasoners , author=. The Eleventh International Conference on Learning Representations , year=
[32]

2025 , note=

DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL , author=. 2025 , note=

2025
[33]

2026 , eprint=

Ministral 3 , author=. 2026 , eprint=

2026
[34]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[35]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

2025
[36]

Cross-lingual prompting: Improving zero-shot chain-of- thought reasoning across languages

Qin, Libo and Chen, Qiguang and Wei, Fuxuan and Huang, Shijue and Che, Wanxiang. Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.163

work page doi:10.18653/v1/2023.emnlp-main.163 2023
[37]

Know your limits: A survey of abstention in large language models.Transactions of the Association for Computational Linguistics, 13:529–556, 2025

Wen, Bingbing and Yao, Jihan and Feng, Shangbin and Xu, Chenjun and Tsvetkov, Yulia and Howe, Bill and Wang, Lucy Lu , title =. Transactions of the Association for Computational Linguistics , volume =. 2025 , month =. doi:10.1162/tacl_a_00754 , url =

work page doi:10.1162/tacl_a_00754 2025
[38]

R -Tuning: Instructing Large Language Models to Say ` I Don ' t Know'

Zhang, Hanning and Diao, Shizhe and Lin, Yong and Fung, Yi and Lian, Qing and Wang, Xingyao and Chen, Yangyi and Ji, Heng and Zhang, Tong. R -Tuning: Instructing Large Language Models to Say ` I Don ' t Know'. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol...

work page doi:10.18653/v1/2024.naacl-long.394 2024
[39]

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

Liu, Chaoqun and Zhang, Wenxuan and Zhao, Yiran and Luu, Anh Tuan and Bing, Lidong. Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2...

work page doi:10.18653/v1/2025.naacl-long.485 2025
[40]

2025 , eprint=

KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality , author=. 2025 , eprint=

2025
[41]

2025 , eprint=

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025
[42]

arXiv preprint arXiv:2503.09516 , year=

Search-r1: Training llms to reason and leverage search engines with reinforcement learning , author=. arXiv preprint arXiv:2503.09516 , year=

Pith/arXiv arXiv
[43]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

Li, Xiaoxi and Dong, Guanting and Jin, Jiajie and Zhang, Yuyao and Zhou, Yujia and Zhu, Yutao and Zhang, Peitian and Dou, Zhicheng. Search-o1: Agentic Search-Enhanced Large Reasoning Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.276

work page doi:10.18653/v1/2025.emnlp-main.276 2025
[44]

2025 , eprint=

Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent , author=. 2025 , eprint=

2025
[45]

SMART : Self-Aware Agent for Tool Overuse Mitigation

Qian, Cheng and Acikgoz, Emre Can and Wang, Hongru and Chen, Xiusi and Sil, Avirup and Hakkani-T. SMART : Self-Aware Agent for Tool Overuse Mitigation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.239

work page doi:10.18653/v1/2025.findings-acl.239 2025
[46]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.845

work page doi:10.18653/v1/2024.acl-long.845 2024
[47]

arXiv preprint arXiv:2508.03140 , year=

RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior , author=. arXiv preprint arXiv:2508.03140 , year=

arXiv
[48]

arXiv preprint arXiv:2502.09056 , year=

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging--An Open Recipe , author=. arXiv preprint arXiv:2502.09056 , year=

arXiv
[49]

arXiv preprint arXiv:2605.09548 , year=

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning , author=. arXiv preprint arXiv:2605.09548 , year=

Pith/arXiv arXiv
[50]

arXiv preprint arXiv:2602.05940 , year=

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training , author=. arXiv preprint arXiv:2602.05940 , year=

arXiv
[51]

arXiv preprint arXiv:2603.25419 , year=

TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning , author=. arXiv preprint arXiv:2603.25419 , year=

arXiv
[52]

arXiv preprint arXiv:2507.05418 , year=

Learn globally, speak locally: Bridging the gaps in multilingual reasoning , author=. arXiv preprint arXiv:2507.05418 , year=

arXiv
[53]

2025 , eprint=

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs , author=. 2025 , eprint=

2025
[54]

The State and Fate of Linguistic Diversity and Inclusion in the

Joshi, Pratik and Santy, Sebastin and Budhiraja, Amar and Bali, Kalika and Choudhury, Monojit. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.560

work page doi:10.18653/v1/2020.acl-main.560 2020
[55]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

2019
[56]

(2017) ’chrF++: Words Helping Character N-Grams’ in Proceedings of the Second Confer- ence on Machine Translation

Popovi \'c , Maja. chr F ++: words helping character n-grams. Proceedings of the Second Conference on Machine Translation. 2017. doi:10.18653/v1/W17-4770

work page doi:10.18653/v1/w17-4770 2017

[1] [1]

2026 , eprint=

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models? , author=. 2026 , eprint=

2026

[2] [2]

Question Translation Training for Better Multilingual Reasoning

Zhu, Wenhao and Huang, Shujian and Yuan, Fei and She, Shuaijie and Chen, Jiajun and Birch, Alexandra. Question Translation Training for Better Multilingual Reasoning. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.498

work page doi:10.18653/v1/2024.findings-acl.498 2024

[3] [3]

2025 , eprint=

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap , author=. 2025 , eprint=

2025

[4] [4]

2024 , eprint=

xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning , author=. 2024 , eprint=

2024

[5] [5]

MAPO : Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

She, Shuaijie and Zou, Wei and Huang, Shujian and Zhu, Wenhao and Liu, Xiang and Geng, Xiang and Chen, Jiajun. MAPO : Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.539

work page doi:10.18653/v1/2024.acl-long.539 2024

[6] [6]

B ench MAX : A Comprehensive Multilingual Evaluation Suite for Large Language Models

Huang, Xu and Zhu, Wenhao and Hu, Hanxu and He, Conghui and Li, Lei and Huang, Shujian and Yuan, Fei. B ench MAX : A Comprehensive Multilingual Evaluation Suite for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.909

work page doi:10.18653/v1/2025.findings-emnlp.909 2025

[7] [7]

The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

[8] [8]

2025 , journal=

Crosslingual Reasoning through Test-Time Scaling , author=. 2025 , journal=

2025

[9] [9]

2026 , eprint=

Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models , author=. 2026 , eprint=

2026

[10] [10]

2025 , journal=

Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models? , author=. 2025 , journal=

2025

[11] [11]

2026 , eprint=

What Makes Good Multilingual Reasoning? Disentangling Reasoning Traces with Measurable Features , author=. 2026 , eprint=

2026

[12] [12]

When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

Qi, Jirui and Chen, Shan and Xiong, Zidi and Fern \'a ndez, Raquel and Bitterman, Danielle and Bisazza, Arianna. When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.1103

work page doi:10.18653/v1/2025.findings-emnlp.1103 2025

[13] [13]

The Fourteenth International Conference on Learning Representations , year=

Long Chain-of-Thought Reasoning Across Languages , author=. The Fourteenth International Conference on Learning Representations , year=

[14] [14]

2026 , eprint=

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training , author=. 2026 , eprint=

2026

[15] [15]

2024 , eprint=

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers , author=. 2024 , eprint=

2024

[16] [16]

MindMerger: Efficiently Boosting

Zixian Huang and Wenhao Zhu and Gong Cheng and Lei Li and Fei Yuan , booktitle=. MindMerger: Efficiently Boosting. 2024 , url=

2024

[17] [17]

L ang B ridge: Multilingual Reasoning Without Multilingual Supervision

Yoon, Dongkeun and Jang, Joel and Kim, Sungdong and Kim, Seungone and Shafayat, Sheikh and Seo, Minjoon. L ang B ridge: Multilingual Reasoning Without Multilingual Supervision. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.405

work page doi:10.18653/v1/2024.acl-long.405 2024

[18] [18]

A Survey of Multilingual Reasoning in Language Models

Ghosh, Akash and Datta, Debayan and Saha, Sriparna and Agarwal, Chirag. A Survey of Multilingual Reasoning in Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.474

work page doi:10.18653/v1/2025.findings-emnlp.474 2025

[19] [19]

MMLU - P ro X : A Multilingual Benchmark for Advanced Large Language Model Evaluation

Xuan, Weihao and Yang, Rui and Qi, Heli and Zeng, Qingcheng and Xiao, Yunze and Feng, Aosong and Liu, Dairui and Xing, Yun and Wang, Junjue and Gao, Fan and Lu, Jinghui and Jiang, Yuang and Li, Huitao and Li, Xin and Yu, Kunyu and Dong, Ruihai and Gu, Shangding and Li, Yuekang and Xie, Xiaofei and Juefei-Xu, Felix and Khomh, Foutse and Yoshie, Osamu and C...

work page doi:10.18653/v1/2025.emnlp-main.79 2025

[20] [20]

2024 , url=

Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen , booktitle=. 2024 , url=

2024

[21] [21]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page doi:10.1038/s41586-025-09422-z

[22] [22]

2026 , eprint=

Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning , author=. 2026 , eprint=

2026

[23] [23]

2026 , eprint=

OpenAI o1 System Card , author=. 2026 , eprint=

2026

[24] [24]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025

[25] [25]

2025 , eprint=

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use , author=. 2025 , eprint=

2025

[26] [26]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024

[27] [27]

Kydlíček, Hynek , license =

[28] [28]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024

[29] [29]

2025 , eprint=

DAPO: An Open-Source LLM Reinforcement Learning System at Scale , author=. 2025 , eprint=

2025

[30] [30]

2025 , howpublished =

OpenAI , title =. 2025 , howpublished =

2025

[31] [31]

The Eleventh International Conference on Learning Representations , year=

Language models are multilingual chain-of-thought reasoners , author=. The Eleventh International Conference on Learning Representations , year=

[32] [32]

2025 , note=

DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL , author=. 2025 , note=

2025

[33] [33]

2026 , eprint=

Ministral 3 , author=. 2026 , eprint=

2026

[34] [34]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[35] [35]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

2025

[36] [36]

Cross-lingual prompting: Improving zero-shot chain-of- thought reasoning across languages

Qin, Libo and Chen, Qiguang and Wei, Fuxuan and Huang, Shijue and Che, Wanxiang. Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.163

work page doi:10.18653/v1/2023.emnlp-main.163 2023

[37] [37]

Know your limits: A survey of abstention in large language models.Transactions of the Association for Computational Linguistics, 13:529–556, 2025

Wen, Bingbing and Yao, Jihan and Feng, Shangbin and Xu, Chenjun and Tsvetkov, Yulia and Howe, Bill and Wang, Lucy Lu , title =. Transactions of the Association for Computational Linguistics , volume =. 2025 , month =. doi:10.1162/tacl_a_00754 , url =

work page doi:10.1162/tacl_a_00754 2025

[38] [38]

R -Tuning: Instructing Large Language Models to Say ` I Don ' t Know'

Zhang, Hanning and Diao, Shizhe and Lin, Yong and Fung, Yi and Lian, Qing and Wang, Xingyao and Chen, Yangyi and Ji, Heng and Zhang, Tong. R -Tuning: Instructing Large Language Models to Say ` I Don ' t Know'. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol...

work page doi:10.18653/v1/2024.naacl-long.394 2024

[39] [39]

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

Liu, Chaoqun and Zhang, Wenxuan and Zhao, Yiran and Luu, Anh Tuan and Bing, Lidong. Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2...

work page doi:10.18653/v1/2025.naacl-long.485 2025

[40] [40]

2025 , eprint=

KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality , author=. 2025 , eprint=

2025

[41] [41]

2025 , eprint=

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025

[42] [42]

arXiv preprint arXiv:2503.09516 , year=

Search-r1: Training llms to reason and leverage search engines with reinforcement learning , author=. arXiv preprint arXiv:2503.09516 , year=

Pith/arXiv arXiv

[43] [43]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

Li, Xiaoxi and Dong, Guanting and Jin, Jiajie and Zhang, Yuyao and Zhou, Yujia and Zhu, Yutao and Zhang, Peitian and Dou, Zhicheng. Search-o1: Agentic Search-Enhanced Large Reasoning Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.276

work page doi:10.18653/v1/2025.emnlp-main.276 2025

[44] [44]

2025 , eprint=

Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent , author=. 2025 , eprint=

2025

[45] [45]

SMART : Self-Aware Agent for Tool Overuse Mitigation

Qian, Cheng and Acikgoz, Emre Can and Wang, Hongru and Chen, Xiusi and Sil, Avirup and Hakkani-T. SMART : Self-Aware Agent for Tool Overuse Mitigation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.239

work page doi:10.18653/v1/2025.findings-acl.239 2025

[46] [46]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.845

work page doi:10.18653/v1/2024.acl-long.845 2024

[47] [47]

arXiv preprint arXiv:2508.03140 , year=

RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior , author=. arXiv preprint arXiv:2508.03140 , year=

arXiv

[48] [48]

arXiv preprint arXiv:2502.09056 , year=

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging--An Open Recipe , author=. arXiv preprint arXiv:2502.09056 , year=

arXiv

[49] [49]

arXiv preprint arXiv:2605.09548 , year=

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning , author=. arXiv preprint arXiv:2605.09548 , year=

Pith/arXiv arXiv

[50] [50]

arXiv preprint arXiv:2602.05940 , year=

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training , author=. arXiv preprint arXiv:2602.05940 , year=

arXiv

[51] [51]

arXiv preprint arXiv:2603.25419 , year=

TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning , author=. arXiv preprint arXiv:2603.25419 , year=

arXiv

[52] [52]

arXiv preprint arXiv:2507.05418 , year=

Learn globally, speak locally: Bridging the gaps in multilingual reasoning , author=. arXiv preprint arXiv:2507.05418 , year=

arXiv

[53] [53]

2025 , eprint=

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs , author=. 2025 , eprint=

2025

[54] [54]

The State and Fate of Linguistic Diversity and Inclusion in the

Joshi, Pratik and Santy, Sebastin and Budhiraja, Amar and Bali, Kalika and Choudhury, Monojit. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.560

work page doi:10.18653/v1/2020.acl-main.560 2020

[55] [55]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

2019

[56] [56]

(2017) ’chrF++: Words Helping Character N-Grams’ in Proceedings of the Second Confer- ence on Machine Translation

Popovi \'c , Maja. chr F ++: words helping character n-grams. Proceedings of the Second Conference on Machine Translation. 2017. doi:10.18653/v1/W17-4770

work page doi:10.18653/v1/w17-4770 2017