pith. machine review for the scientific record. sign in

arxiv: 2604.18027 · v1 · submitted 2026-04-20 · 💻 cs.SE · cs.CL

Recognition: unknown

CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:55 UTC · model grok-4.3

classification 💻 cs.SE cs.CL
keywords code transpilationmultilingual code translationreinforcement learningintermediate representationlow-resource programming languagesLLM trainingPython pivot
0
0 comments X

The pith

CodePivot shows that routing all code translations through Python as an intermediate step, combined with a novel reinforcement learning reward, allows a 7B model to develop multilingual transpilation abilities without any parallel corpora.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Transpilation between programming languages helps update legacy codebases and support languages with little data, but most training methods demand parallel examples for every language pair, which are especially scarce for low-resource languages. The paper presents a training approach that uses only Python-to-other translations by making Python the central pivot point for all conversions. A new reinforcement learning reward called Aggressive-Partial-Functional pushes the model to generate code that is partially correct yet functional during training. The resulting 7B model then handles translations in both directions and across ten languages, including low-resource cases, while surpassing much larger models on targeted task sets.

Core claim

The paper establishes that a 7B LLM trained exclusively on Python-to-Others transpilation tasks using reinforcement learning with the Aggressive-Partial-Functional reward bootstraps the model's ability to perform multilingual transpilation, leading to consistent improvements on general and low-resource PL-related tasks and outperforming larger models such as Deepseek-R1 on Python-to-Others tasks and Qwen3-235B-A22B-Instruct-2507 on Others-to-All tasks.

What carries the argument

The CodePivot framework that routes every translation through Python as an intermediate representation, paired with the Aggressive-Partial-Functional reward in the reinforcement learning process to enable learning without parallel corpora for all pairs.

If this is right

  • Training only on Python-to-Others tasks produces gains on both Python-to-Others and Others-to-All transpilation tasks.
  • The 7B model outperforms substantially larger mainstream models on Python-to-Others tasks and on Others-to-All tasks.
  • The approach yields better results on general transpilation tasks than a model trained directly on Any-to-Any tasks.
  • Low-resource programming languages benefit because the method reduces reliance on scarce parallel data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A similar pivot strategy could lower data needs in other multi-way adaptation settings where paired examples are limited.
  • Rewards that credit partial functionality may prove useful in additional code-related training scenarios where exact outputs are rare at the start.
  • The framework might allow quick addition of new programming languages by extending only the Python-to-new-language direction.

Load-bearing premise

The method assumes that Python can serve as an effective bridge for all language pairs and that the Aggressive-Partial-Functional reward will teach the model general translation rules that apply to unseen directions and low-resource languages.

What would settle it

A direct evaluation of the trained model on transpilation between two non-Python languages that share no training connections would show whether the claimed generalization holds without the Python pivot.

Figures

Figures reproduced from arXiv: 2604.18027 by Chun Yong Chong, Huiri Tan, Jiasi Shen, Juyong Jiang, Meibo Ren, Shangyu Li, Sizhe Zhong, Xu Han, Yunhao Gou, Yun Peng.

Figure 1
Figure 1. Figure 1: Correlation between programming language re [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The workflow of the training process of CODEPIVOT. The core components of the workflow consist of the Python-IR-pivoted SFT stage (Section 3.2) and the Aggressive-Partial-Functional reward (APF) augmented RL stage (Section 3.3). addition, some studies [50] investigate the use of compiler intermediate representations to improve the multilingual capabilities of Code LMs and facilitate cross-lingual transfer.… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of models on low-resource PL-related transpilation tasks. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scatter plot illustrating the relationship be [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of mean reward trajectories during RL training under four distinct reward [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prompt template used for the transpilation task in both training and evaluation. The [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
read the original abstract

Transpilation, or code translation, aims to convert source code from one programming language (PL) to another. It is beneficial for many downstream applications, from modernizing large legacy codebases to augmenting data for low-resource PLs. Recent large language model (LLM)-based approaches have demonstrated immense potential for code translation. Among these approaches, training-based methods are particularly important because LLMs currently do not effectively adapt to domain-specific settings that suffer from a lack of knowledge without targeted training. This limitation is evident in transpilation tasks involving low-resource PLs. However, existing training-based approaches rely on a pairwise transpilation paradigm, making it impractical to support a diverse range of PLs. This limitation is particularly prominent for low-resource PLs due to a scarcity of training data. Furthermore, these methods suffer from suboptimal reinforcement learning (RL) reward formulations. To address these limitations, we propose CodePivot, a training framework that leverages Python as an intermediate representation (IR), augmented by a novel RL reward mechanism, Aggressive-Partial-Functional reward, to bootstrap the model's multilingual transpilation ability without requiring parallel corpora. Experiments involving 10 PLs show that the resulting 7B model, trained on Python-to-Others tasks, consistently improves performance across both general and low-resource PL-related transpilation tasks. It outperforms substantially larger mainstream models with hundreds of billions more parameters, such as Deepseek-R1 and Qwen3-235B-A22B-Instruct-2507, on Python-to-Others tasks and Others-to-All tasks, respectively. In addition, it outperforms its counterpart trained directly on Any-to-Any tasks on general transpilation tasks. The code and data are available at https://github.com/lishangyu-hkust/CodePivot.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes CodePivot, a training framework that uses Python as an intermediate representation combined with a novel Aggressive-Partial-Functional RL reward to enable multilingual code transpilation in LLMs without any parallel corpora. A 7B model is trained only on Python-to-Others tasks across 10 programming languages and is reported to improve performance on both general and low-resource transpilation benchmarks, outperforming substantially larger models (e.g., Deepseek-R1, Qwen3-235B) on Python-to-Others and Others-to-All tasks respectively, while also beating a direct Any-to-Any baseline.

Significance. If the central results hold, the work would be significant for low-resource language support in code models, as it removes the pairwise data requirement that currently limits scaling to many PLs. The public release of code and data at the GitHub link is a clear strength that supports reproducibility. The pivot-plus-RL approach, if shown to produce semantically faithful intermediates, could generalize to other domain-adaptation settings where parallel data is scarce.

major comments (2)
  1. [§3.2] §3.2 (Aggressive-Partial-Functional reward definition): the reward is load-bearing for the entire bootstrapping claim, yet the manuscript provides no correlation analysis or held-out unit-test results demonstrating that the reward (whether syntax-based, partial-execution, or model-judged) aligns with actual input-output semantic equivalence across the 10 languages; without this, the generalization from Python-to-Others training to reliable Others-to-All chaining remains unverified.
  2. [§4.3, Table 3] §4.3 and Table 3 (Others-to-All results): the headline outperformance over models with hundreds of billions more parameters is reported without error bars, multiple random seeds, or statistical significance tests; given the potential for error propagation through the Python pivot, these omissions make it impossible to assess whether the gains are robust or merely artifacts of the evaluation distribution.
minor comments (2)
  1. [Abstract] The abstract states performance gains but supplies no numerical metrics, baselines, or language-pair breakdowns; adding a compact results summary would improve readability.
  2. [§3.2] Notation for the reward components (e.g., how 'partial' and 'aggressive' are operationalized) is introduced without a compact equation or pseudocode box, making the method harder to re-implement from the text alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments raise valid points regarding validation of the reward mechanism and statistical robustness of the results. We address each major comment point by point below, indicating planned revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Aggressive-Partial-Functional reward definition): the reward is load-bearing for the entire bootstrapping claim, yet the manuscript provides no correlation analysis or held-out unit-test results demonstrating that the reward (whether syntax-based, partial-execution, or model-judged) aligns with actual input-output semantic equivalence across the 10 languages; without this, the generalization from Python-to-Others training to reliable Others-to-All chaining remains unverified.

    Authors: We agree that explicit validation of the reward's alignment with semantic equivalence would strengthen the bootstrapping claim. The Aggressive-Partial-Functional reward combines aggressive partial execution, syntax verification, and model-based functional judgments to approximate input-output equivalence without parallel corpora. While the manuscript prioritizes end-to-end empirical gains on transpilation benchmarks, we acknowledge the value of direct correlation evidence. In the revised manuscript, we will add a new analysis subsection under §3.2 that reports Pearson correlations and held-out unit-test pass rates across all 10 languages, showing that reward scores positively correlate with semantic correctness. This addition will support the reliability of chaining through the Python pivot for Others-to-All tasks. revision: yes

  2. Referee: [§4.3, Table 3] §4.3 and Table 3 (Others-to-All results): the headline outperformance over models with hundreds of billions more parameters is reported without error bars, multiple random seeds, or statistical significance tests; given the potential for error propagation through the Python pivot, these omissions make it impossible to assess whether the gains are robust or merely artifacts of the evaluation distribution.

    Authors: We thank the referee for highlighting this important omission in statistical reporting. The results in Table 3 were obtained from single-run evaluations owing to substantial computational requirements for training the 7B model and evaluating against large baselines. To address potential error propagation in the pivot-based approach and to demonstrate robustness, we will revise §4.3 and Table 3 to include multiple random seeds (at least three), report means with standard deviations as error bars, and add paired statistical significance tests against the compared models. These changes will enable a clearer assessment of whether the observed outperformance is consistent. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL training result on external benchmarks

full rationale

The paper presents an empirical training framework (Python pivot + Aggressive-Partial-Functional reward) whose central claim is measured performance gains on held-out transpilation benchmarks across 10 languages. No equations, fitted parameters, or self-citations are shown to reduce the reported improvements to the training inputs by construction. The reward mechanism and evaluation rest on external test cases rather than re-using the same fitted quantities or self-referential definitions. This is the normal non-circular outcome for an applied RL paper whose success is falsifiable on independent benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unstated assumption that Python is a sufficiently expressive and neutral intermediate representation for all ten languages tested and that the new RL reward provides a reliable training signal without parallel data. No free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)
  • domain assumption Python serves as an effective universal intermediate representation for transpilation across the tested languages
    Invoked by the choice to train only on Python-to-Others and evaluate on Others-to-All
invented entities (1)
  • Aggressive-Partial-Functional reward no independent evidence
    purpose: To provide a training signal for RL that encourages partial functional correctness without requiring exact matches or parallel corpora
    New reward formulation introduced to address suboptimal RL rewards in prior work

pith-pipeline@v0.9.0 · 5657 in / 1348 out tokens · 28111 ms · 2026-05-10T04:55:01.419206+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

93 extracted references · 23 canonical work pages · 7 internal anchors

  1. [1]

    Avatar: A parallel corpus for java-python program translation

    Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang. Avatar: A parallel corpus for java-python program translation. InFindings of the Association for Computational Linguistics: ACL 2023, pages 2268–2281, 2023

  2. [2]

    Integrating large language models and reinforcement learning for non-linear reasoning.Proc

    Yoav Alon and Cristina David. Integrating large language models and reinforcement learning for non-linear reasoning.Proc. ACM Softw. Eng., 2(FSE), June 2025

  3. [3]

    Smollm-corpus, 2024

    Loubna Ben Allal, Anton Lozhkov, Guilherme Penedo, Thomas Wolf, and Leandro von Werra. Smollm-corpus, 2024

  4. [4]

    Verified code transpilation with llms.Advances in Neural Information Processing Systems, 37:41394–41424, 2024

    Sahil Bhatia, Jie Qiu, Niranjan Hasabnis, Sanjit A Seshia, and Alvin Cheung. Verified code transpilation with llms.Advances in Neural Information Processing Systems, 37:41394–41424, 2024

  5. [5]

    Enhancing code translation in language models with few-shot learning via retrieval- augmented generation

    Manish Bhattarai, Javier E Santos, Shawn Jones, Ayan Biswas, Boian Alexandrov, and Daniel O’Malley. Enhancing code translation in language models with few-shot learning via retrieval- augmented generation. In2024 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–8. IEEE, 2024

  6. [6]

    Bigcode models leaderboard

    BigCode. Bigcode models leaderboard. https://huggingface.co/spaces/bigcode/ bigcode-models-leaderboard, 2023. Accessed: 2024-05-24

  7. [7]

    C2rust, January 2026

    C2Rust. C2rust, January 2026

  8. [8]

    Knowledge transfer from high-resource to low-resource programming languages for code llms.Proceedings of the ACM on Programming Languages, 8(OOPSLA2):677–708, 2024

    Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, and Arjun Guha. Knowledge transfer from high-resource to low-resource programming languages for code llms.Proceedings of the ACM on Programming Languages, 8(OOPSLA2):677–708, 2024

  9. [9]

    Multipl-e: A scalable and polyglot approach to benchmarking neural code generation.IEEE Trans

    Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, and Abhinav Jangda. Multipl-e: A scalable and polyglot approach to benchmarking neural code generation.IEEE Trans. Softw. Eng., 49(7):3675–3691, July 2023. 15

  10. [10]

    Evaluating Large Language Models Trained on Code

    Mark Chen. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

  11. [11]

    Seal: Towards diverse specification inference for linux interfaces from security patches

    Wei Chen, Bowen Zhang, Chengpeng Wang, Wensheng Tang, and Charles Zhang. Seal: Towards diverse specification inference for linux interfaces from security patches. InProceedings of the Twentieth European Conference on Computer Systems, pages 1246–1262, 2025

  12. [12]

    Cxgo, January 2026

    Cxgo. Cxgo, January 2026

  13. [13]

    The design and application of a retargetable peephole optimizer.ACM Transactions on Programming Languages and Systems (TOPLAS), 2(2):191–202, 1980

    Jack W Davidson and Christopher W Fraser. The design and application of a retargetable peephole optimizer.ACM Transactions on Programming Languages and Systems (TOPLAS), 2(2):191–202, 1980

  14. [14]

    Translating c to safer rust

    Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. Translating c to safer rust. Proceedings of the ACM on Programming Languages, 5(OOPSLA):1–29, 2021

  15. [15]

    Large language models for code translation: An in-depth analysis of code smells and functional correctness.ACM Transactions on Software Engineering and Methodology, 2025

    Christof Feischl and Roman Kern. Large language models for code translation: An in-depth analysis of code smells and functional correctness.ACM Transactions on Software Engineering and Methodology, 2025

  16. [16]

    Rlef: Grounding code llms in execution feedback with reinforcement learning.arXiv preprint arXiv:2410.02089, 2024

    Jonas Gehring, Kunhao Zheng, Jade Copet, Vegard Mella, Quentin Carbonneaux, Taco Cohen, and Gabriel Synnaeve. Rlef: Grounding code llms in execution feedback with reinforcement learning.arXiv preprint arXiv:2410.02089, 2024

  17. [17]

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence.arXiv preprint arXiv:2401.14196, 2024

  18. [18]

    Suchin Gururangan, Ana Marasovi´c, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, ...

  19. [19]

    Automated testing of cobol to java transformation.arXiv preprint arXiv:2504.10548, 2025

    Sandeep Hans, Atul Kumar, Toshikai Yasue, Kouichi Ono, Saravanan Krishnan, Devika Sondhi, Fumiko Satoh, Gerald Mitchell, Sachin Kumar, and Diptikalyan Saha. Automated testing of cobol to java transformation.arXiv preprint arXiv:2504.10548, 2025

  20. [20]

    To tag, or not to tag: Translating c’s unions to rust’s tagged unions

    Jaemin Hong and Sukyoung Ryu. To tag, or not to tag: Translating c’s unions to rust’s tagged unions. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pages 40–52, 2024

  21. [21]

    Type-migrating c-to-rust translation using a large language model.Empirical Software Engineering, 30(1):3, 2025

    Jaemin Hong and Sukyoung Ryu. Type-migrating c-to-rust translation using a large language model.Empirical Software Engineering, 30(1):3, 2025

  22. [22]

    Neliac—dialect of algol.Communica- tions of the ACM, 3(8):463–468, 1960

    Harry D Huskey, Maurice H Halstead, and R McArthur. Neliac—dialect of algol.Communica- tions of the ACM, 3(8):463–468, 1960

  23. [23]

    Alphatrans: A neuro-symbolic compositional approach for repository-level code translation and validation.Proceedings of the ACM on Software Engineering, 2(FSE):2454–2476, 2025

    Ali Reza Ibrahimzada, Kaiyao Ke, Mrigank Pawagi, Muhammad Salman Abid, Rangeet Pan, Saurabh Sinha, and Reyhaneh Jabbarvand. Alphatrans: A neuro-symbolic compositional approach for repository-level code translation and validation.Proceedings of the ACM on Software Engineering, 2(FSE):2454–2476, 2025

  24. [24]

    J2cstranslator, January 2026

    J2cstranslator. J2cstranslator, January 2026

  25. [25]

    Cotran: An llm-based code translator using reinforcement learning with feedback from compiler and symbolic execution

    Prithwish Jana, Piyush Jha, Haoyang Ju, Gautham Kishore, Aryan Mahajan, and Vijay Ganesh. Cotran: An llm-based code translator using reinforcement learning with feedback from compiler and symbolic execution. InECAI 2024, pages 4011–4018. IOS Press, 2024

  26. [26]

    Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean

    Melvin Johnson, Mike Schuster, Quoc V . Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s multilingual neural machine translation system: Enabling zero-shot translation.Transactions of the Association for Computational Linguistics, 5:339–351, 2017. 16

  27. [27]

    Efficient memory management for large language model serving with pagedattention

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th symposium on operating systems principles, pages 611–626, 2023

  28. [28]

    Llvm: A compilation framework for lifelong program analysis & transformation

    Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. InInternational symposium on code generation and optimization, 2004. CGO 2004., pages 75–86. IEEE, 2004

  29. [29]

    Codei/o: Condensingreasoning patterns via code input-output prediction, 2025

    Junlong Li, Daya Guo, Dejian Yang, Runxin Xu, Yu Wu, and Junxian He. Codei/o: Condensing reasoning patterns via code input-output prediction.arXiv preprint arXiv:2502.07316, 2025

  30. [30]

    StarCoder: may the source be with you!

    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. Starcoder: may the source be with you!arXiv preprint arXiv:2305.06161, 2023

  31. [31]

    Osvbench: Benchmarking llms on specification generation tasks for operating system verification.arXiv preprint arXiv:2504.20964, 2025

    Shangyu Li, Juyong Jiang, Tiancheng Zhao, and Jiasi Shen. Osvbench: Benchmarking llms on specification generation tasks for operating system verification.arXiv preprint arXiv:2504.20964, 2025

  32. [32]

    A sound static analysis approach to i/o api migration.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):584–614, 2025

    Shangyu Li, Zhaoyang Zhang, Sizhe Zhong, Diyu Zhou, and Jiasi Shen. A sound static analysis approach to i/o api migration.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):584–614, 2025

  33. [33]

    Competition-level code generation with alphacode.Science, 378(6624):1092–1097, 2022

    Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code generation with alphacode.Science, 378(6624):1092–1097, 2022

  34. [34]

    In rust we trust: a transpiler from unsafe c to safer rust

    Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R Cordy, and Ahmed E Hassan. In rust we trust: a transpiler from unsafe c to safer rust. InProceedings of the ACM/IEEE 44th international conference on software engineering: companion proceedings, pages 354–355, 2022

  35. [35]

    Alive2: bounded translation validation for llvm

    Nuno P Lopes, Juneyoung Lee, Chung-Kil Hur, Zhengyang Liu, and John Regehr. Alive2: bounded translation validation for llvm. InProceedings of the 42nd ACM SIGPLAN Inter- national Conference on Programming Language Design and Implementation, pages 65–79, 2021

  36. [36]

    Provably correct peephole optimizations with alive

    Nuno P Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. Provably correct peephole optimizations with alive. InProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 22–32, 2015

  37. [37]

    StarCoder 2 and The Stack v2: The Next Generation

    Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Noua- mane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. Starcoder 2 and the stack v2: The next generation.arXiv preprint arXiv:2402.19173, 2024

  38. [38]

    Cogo, and Bram Adams.InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-Based Code Translation, page 1153–1164

    Marcos Macedo, Yuan Tian, Pengyu Nie, Filipe R. Cogo, and Bram Adams.InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-Based Code Translation, page 1153–1164. IEEE Press, 2025

  39. [39]

    GitHut 2.0: GitHub Language Discovery

    madnight. GitHut 2.0: GitHub Language Discovery. https://madnight.github.io/ githut/#/, 2026. Accessed: 2026-1-24

  40. [40]

    Automated transpilation of imperative to functional code using neural-guided program synthesis.Proceedings of the ACM on Programming Languages, 6(OOPSLA1):1–27, 2022

    Benjamin Mariano, Yanju Chen, Yu Feng, Greg Durrett, and I¸ sil Dillig. Automated transpilation of imperative to functional code using neural-guided program synthesis.Proceedings of the ACM on Programming Languages, 6(OOPSLA1):1–27, 2022

  41. [41]

    The rust language

    Nicholas D Matsakis and Felix S Klock. The rust language. InProceedings of the 2014 ACM SIGAda annual conference on High integrity language technology, pages 103–104, 2014

  42. [42]

    Generic and gimple: A new tree representation for entire functions

    Jason Merrill. Generic and gimple: A new tree representation for entire functions. InProceed- ings of the 2003 GCC Summit, pages 171–180, 2003

  43. [43]

    GitHub Copilot – Your AI pair programmer

    Microsoft. GitHub Copilot – Your AI pair programmer. GitHub repository, 2026. 17

  44. [44]

    Ai-assisted code authoring at scale: Fine-tuning, deploying, and mixed methods evaluation.Proceedings of the ACM on Software Engineering, 1(FSE):1066–1085, 2024

    Vijayaraghavan Murali, Chandra Maddila, Imad Ahmad, Michael Bolin, Daniel Cheng, Negar Ghorbani, Renuka Fernandez, Nachiappan Nagappan, and Peter C Rigby. Ai-assisted code authoring at scale: Fine-tuning, deploying, and mixed methods evaluation.Proceedings of the ACM on Software Engineering, 1(FSE):1066–1085, 2024

  45. [45]

    Hyperkernel: Push-button verification of an os kernel

    Luke Nelson, Helgi Sigurbjarnarson, Kaiyuan Zhang, Dylan Johnson, James Bornholt, Emina Torlak, and Xi Wang. Hyperkernel: Push-button verification of an os kernel. InProceedings of the 26th Symposium on Operating Systems Principles, pages 252–269, 2017

  46. [46]

    C2SaferRust: Transforming C projects into safer Rust with neurosymbolic tech- niques.arXiv preprint arXiv:2501.14257, 2025

    Vikram Nitin, Rahul Krishna, Luiz Lemos do Valle, and Baishakhi Ray. C2saferrust: Transform- ing c projects into safer rust with neurosymbolic techniques.arXiv preprint arXiv:2501.14257, 2025

  47. [47]

    Repository-level code translation benchmark targeting rust.arXiv preprint arXiv:2411.13990, 2024

    Guangsheng Ou, Mingwei Liu, Yuxuan Chen, Xin Peng, and Zibin Zheng. Repository-level code translation benchmark targeting rust.arXiv preprint arXiv:2411.13990, 2024

  48. [48]

    Lost in translation: A study of bugs introduced by large language models while translating code

    Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. Lost in translation: A study of bugs introduced by large language models while translating code. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, p...

  49. [49]

    Towards inclusive source code readability based on the preferences of programmers with visual impairments

    Maulishree Pandey, Steve Oney, and Andrew Begel. Towards inclusive source code readability based on the preferences of programmers with visual impairments. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–18, 2024

  50. [50]

    IRCoder: Intermediate representations make language models robust multilingual code generators

    Indraneil Paul, Goran Glavaš, and Iryna Gurevych. IRCoder: Intermediate representations make language models robust multilingual code generators. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 15023–15041, Bangkok, Thailand, A...

  51. [51]

    arXiv preprint arXiv:2105.12655 (2021)

    Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al. Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks.arXiv preprint arXiv:2105.12655, 2021

  52. [52]

    Batfix: Repairing language model-based transpilation.ACM Transactions on Software Engineering and Methodology, 33(6):1–29, 2024

    Daniel Ramos, Inês Lynce, Vasco Manquinho, Ruben Martins, and Claire Le Goues. Batfix: Repairing language model-based transpilation.ACM Transactions on Software Engineering and Methodology, 33(6):1–29, 2024

  53. [53]

    Unsu- pervised translation of programming languages

    Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. Unsu- pervised translation of programming languages. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc

  54. [54]

    Approximating KL divergence, 2020

    John Schulman. Approximating KL divergence, 2020. Accessed: 2025-02-22

  55. [55]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  56. [56]

    Sharpen, January 2026

    Sharpen. Sharpen, January 2026

  57. [57]

    Hybridflow: A flexible and efficient rlhf framework

    Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. In Proceedings of the Twentieth European Conference on Computer Systems, pages 1279–1297, 2025

  58. [58]

    Pinpoint: Fast and precise sparse value flow analysis for million lines of code

    Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. InProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 693–706, 2018. 18

  59. [59]

    arXiv preprint arXiv:2301.13816 , year=

    Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni, and Chandan K Reddy. Execution-based code generation using deep reinforcement learning.arXiv preprint arXiv:2301.13816, 2023

  60. [60]

    Migrating from cobol to java

    Harry M Sneed. Migrating from cobol to java. In2010 IEEE International Conference on Software Maintenance, pages 1–7. IEEE, 2010

  61. [61]

    The problem of programming communication with changing machines: a proposed solution.Communications of the ACM, 1(8):12–18, 1958

    J Strong, J Wegstein, A Tritter, J Olsztyn, O Mock, and T Steel. The problem of programming communication with changing machines: a proposed solution.Communications of the ACM, 1(8):12–18, 1958

  62. [62]

    On-demand strong update analysis via value-flow refinement

    Yulei Sui and Jingling Xue. On-demand strong update analysis via value-flow refinement. InProceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pages 460–473, 2016

  63. [63]

    Svf: interprocedural static value-flow analysis in llvm

    Yulei Sui and Jingling Xue. Svf: interprocedural static value-flow analysis in llvm. InProceed- ings of the 25th international conference on compiler construction, pages 265–266, 2016

  64. [64]

    Static memory leak detection using full-sparse value-flow analysis

    Yulei Sui, Ding Ye, and Jingling Xue. Static memory leak detection using full-sparse value-flow analysis. InProceedings of the 2012 International Symposium on Software Testing and Analysis, pages 254–264, 2012

  65. [65]

    Detecting memory leaks statically with full-sparse value-flow analysis.IEEE Transactions on Software Engineering, 40(2):107–122, 2014

    Yulei Sui, Ding Ye, and Jingling Xue. Detecting memory leaks statically with full-sparse value-flow analysis.IEEE Transactions on Software Engineering, 40(2):107–122, 2014

  66. [66]

    Code translation with compiler representations.arXiv preprint arXiv:2207.03578, 2022

    Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, and Gabriel Synnaeve. Code translation with compiler representations.arXiv preprint arXiv:2207.03578, 2022

  67. [67]

    Profix: Improving profile-guided optimization in compilers with graph neural networks

    Huiri Tan, Juyong Jiang, and Jiasi Shen. Profix: Improving profile-guided optimization in compilers with graph neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  68. [68]

    Octopus: Scaling value-flow analysis via parallel collection of realizable path conditions.ACM Transactions on Software Engineering and Methodology, 33(3):1–33, 2024

    Wensheng Tang, Dejun Dong, Shijie Li, Chengpeng Wang, Peisen Yao, Jinguo Zhou, and Charles Zhang. Octopus: Scaling value-flow analysis via parallel collection of realizable path conditions.ACM Transactions on Software Engineering and Methodology, 33(3):1–33, 2024

  69. [69]

    Terekhov and Chris Verhoef

    Andrey A. Terekhov and Chris Verhoef. The realities of language conversions.IEEE Softw., 17(6):111–124, November 2000

  70. [70]

    Tiobe, January 2026

    TIOBE. Tiobe, January 2026

  71. [71]

    User- customizable transpilation of scripting languages.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):201–229, 2023

    Bo Wang, Aashish Kolluri, Ivica Nikoli ´c, Teodora Baluta, and Prateek Saxena. User- customizable transpilation of scripting languages.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):201–229, 2023

  72. [72]

    Translating to a low-resource language with compiler feedback: A case study on cangjie.IEEE Transactions on Software Engineering, 2025

    Jun Wang, Chenghao Su, Yijie Ou, Yanhui Li, Jialiang Tan, Lin Chen, and Yuming Zhou. Translating to a low-resource language with compiler feedback: A case study on cangjie.IEEE Transactions on Software Engineering, 2025

  73. [73]

    Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yu- taka Matsuo, and Jiaxian Guo

    Junqiao Wang, Zeng Zhang, Yangfan He, Zihao Zhang, Xinyuan Song, Yuyang Song, Tianyu Shi, Yuchen Li, Hengyuan Xu, Kunyu Wu, et al. Enhancing code llms with reinforcement learning in code generation: A survey.arXiv preprint arXiv:2412.20367, 2024

  74. [74]

    Reinforcement-learning-guided source code summarization using hierarchical attention.IEEE Transactions on software Engineering, 48(1):102–119, 2020

    Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip S Yu, and Guandong Xu. Reinforcement-learning-guided source code summarization using hierarchical attention.IEEE Transactions on software Engineering, 48(1):102–119, 2020

  75. [75]

    Repotransbench: A real-world multilingual benchmark for repository-level code translation.IEEE Transactions on Software Engineering, 2025

    Yanli Wang, Yanlin Wang, Suiquan Wang, Daya Guo, Jiachi Chen, John Grundy, Xilin Liu, Yuchi Ma, Mingzhi Mao, Hongyu Zhang, et al. Repotransbench: A real-world multilingual benchmark for repository-level code translation.IEEE Transactions on Software Engineering, 2025

  76. [76]

    Effireasontrans: Rl-optimized reasoning for code translation.arXiv preprint arXiv:2510.18863, 2025

    Yanlin Wang, Rongyi Ou, Yanli Wang, Mingwei Liu, Jiachi Chen, Ensheng Shi, Xilin Liu, Yuchi Ma, and Zibin Zheng. Effireasontrans: Rl-optimized reasoning for code translation.arXiv preprint arXiv:2510.18863, 2025. 19

  77. [77]

    arXiv preprint arXiv:2502.18449 , year=

    Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, and Sida I Wang. Swe-rl: Advancing llm reasoning via reinforcement learning on open software evolution.arXiv preprint arXiv:2502.18449, 2025

  78. [78]

    Classeval-t: Evaluating large language models in class-level code translation.Proceedings of the ACM on Software Engineering, 2(ISSTA):1421–1444, 2025

    Pengyu Xue, Linhao Wu, Zhen Yang, Chengyi Wang, Xiang Li, Yuxiang Zhang, Jia Li, Ruikai Jin, Yifei Pei, Zhaoyan Shen, et al. Classeval-t: Evaluating large language models in class-level code translation.Proceedings of the ACM on Software Engineering, 2(ISSTA):1421–1444, 2025

  79. [79]

    Vert: Verified equivalent rust transpilation with large language models as few-shot learners,

    Aidan ZH Yang, Yoshiki Takashima, Brandon Paulsen, Josiah Dodds, and Daniel Kroening. Vert: Verified equivalent rust transpilation with large language models as few-shot learners. arXiv preprint arXiv:2404.18852, 2024

  80. [80]

    Scaling laws for code: Every programming language matters.arXiv preprint arXiv:2512.13472, 2025

    Jian Yang, Shawn Guo, Lin Jing, Wei Zhang, Aishan Liu, Chuan Hao, Zhoujun Li, Wayne Xin Zhao, Xianglong Liu, Weifeng Lv, et al. Scaling laws for code: Every programming language matters.arXiv preprint arXiv:2512.13472, 2025

Showing first 80 references.