arxiv: 2605.07403 · v1 · submitted 2026-05-08 · 💻 cs.SE

Recognition: no theorem link

Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

Xinyue Liang , Jingxuan Zhang , Lin Li , Jun Zhang , Junhao Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:56 UTC · model grok-4.3

classification 💻 cs.SE

keywords code translationJava to CangjieLLM trainingerror repairlow-resource languagemulti-stage learningfunctional equivalencecompiler feedback

0 comments

The pith

A multi-stage training framework lets LLMs translate Java to Cangjie more effectively with scarce data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that general large language models can be trained in stages to handle translation from Java into the low-resource Cangjie language by first absorbing syntactic knowledge and monolingual instructions before applying iterative error repair. This addresses the problems of missing language-specific knowledge and few parallel code examples that cause standard models to produce invalid or misaligned code. If the claim holds, it would mean developers can more readily migrate codebases to new programming languages without needing massive amounts of translated examples for training. The reported experiments support this by showing gains in functional equivalence even when parallel data is limited.

Core claim

Through a multi-stage LLM training framework that uses syntactic knowledge datasets and monolingual instruction data followed by error repair with a dedicated Cangjie repository and compiler feedback, the approach achieves semantic alignment and structure awareness in Java-to-Cangjie translations, yielding a 6.06% improvement in functional equivalence over state-of-the-art methods.

What carries the argument

The multi-stage training framework with iterative error repair using compiler feedback and case retrieval.

If this is right

Each of the training stages contributes positively to the final translation performance.
The combination of compiler feedback and error repair case retrieval effectively fixes incorrect Cangjie code.
The method works with limited parallel data where standard fine-tuning approaches struggle.
Functional equivalence and compilability improve compared to existing translation techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This staged knowledge injection approach could apply to code translation involving other emerging programming languages.
Building larger error repair repositories from real usage data might yield further gains in repair success.
The reliance on monolingual data suggests a path for improving LLM performance on tasks with asymmetric data availability.

Load-bearing premise

The specially built syntactic knowledge datasets, monolingual instruction data, and error repair repository contain enough relevant information to teach the LLM reliable Cangjie syntax and semantics despite the absence of large parallel corpora.

What would settle it

Applying the full approach and a single-stage baseline to a fresh collection of Java programs and finding that the functional equivalence improvement falls below 3% or disappears entirely.

Figures

Figures reproduced from arXiv: 2605.07403 by Jingxuan Zhang, Junhao Chen, Jun Zhang, Lin Li, Xinyue Liang.

**Figure 2.** Figure 2: Overall Framework of the Proposed Java-to-Cangjie Translation Pipeline. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Cangjie Documentation Reconstruction Prompt. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: AST-Conditioned Prompt Template for Structure [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Prompt Templates for LLM-based Self-Analytic Re [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: An Example of the error repair case in the error [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Code Translation Performance across Backbone [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

With the rapid evolution of emerging programming language ecosystems, the demand for code translation to low-resource languages continues to grow. As Cangjie emerges as a new programming language, its ecosystem and development toolchains are rapidly expanding. Automated translation from popular programming languages to Cangjie is therefore valuable for practical development. However, constrained by both insufficient Cangjie knowledge and scarce parallel code corpora, general Large Language Models (LLMs) are prone to syntactic errors and semantic as well as structural misalignment in code translation. Existing approaches typically rely on fine-tuning with large-scale parallel data, but they cannot reliably improve compilability or semantic consistency for low-resource Cangjie languages. To tackle these challenges, we propose a multi-stage training framework of LLMs that employs the iterative error repair technique to translate Java code into Cangjie code. This training framework performs training on LLMs, gradually integrating knowledge and achieving semantic alignment as well as structure awareness. During the code translation, we also combine the compiler feedback and error repair case retrieval to repair the incorrect Cangjie code. We construct syntactic knowledge and monolingual instruction datasets to train the LLM. In addition, we also build a Cangjie error repair repository to support error repair in our approach. Experimental results show that, with limited parallel data, our approach improves functional equivalence by 6.06\% compared to the state-of-the-art approaches. Meanwhile, ablation studies confirm that each training stage positively contributes to the final performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multi-stage LLM training plus a custom error-repair repo gives a modest 6% functional-equivalence lift for Java-to-Cangjie under limited data, but the SOTA baseline comparison needs checking.

read the letter

The main takeaway is a multi-stage fine-tuning pipeline that first injects syntactic knowledge and monolingual Cangjie instructions into the LLM, then adds compiler feedback and retrieval from a new error-repair case base during translation. They report a 6.06% gain in functional equivalence over prior methods when parallel data is scarce. That combination for this specific low-resource target is the concrete new piece. Building the syntactic datasets, the instruction data, and the Cangjie repair repository is practical work that directly addresses the lack of language-specific knowledge. Using the compiler as an external signal for iterative repair is a grounded move that avoids relying solely on the model to self-correct. The ablation note in the abstract suggests each stage helps, which is the kind of check that makes the claim more believable. The soft spot is the baseline comparison. The abstract says existing approaches need large-scale parallel data, yet it does not state that the SOTA systems were retrained or re-evaluated on the exact same small corpus plus the new auxiliary datasets and repair repo. If the baselines kept their original high-data regimes, the reported improvement could partly reflect extra resources rather than the multi-stage framework alone. That assumption carries a lot of weight. Experimental details on splits, exact metrics for functional equivalence, and statistical tests are also missing from the abstract, though the full paper may supply them. This is for people working on code translation tooling or LLM adaptation for emerging languages. A reader who needs concrete ways to bootstrap a new language with compiler feedback would find the dataset construction and repair technique useful. I would send it for peer review. The core idea is worth the time to verify the setup and see the full results.

Referee Report

2 major / 1 minor

Summary. The paper proposes a multi-stage LLM training framework for Java-to-Cangjie code translation that integrates syntactic knowledge and monolingual instruction datasets during training, followed by iterative error repair that combines compiler feedback with retrieval from a constructed Cangjie error repair repository. The central empirical claim is that this approach achieves a 6.06% improvement in functional equivalence over state-of-the-art methods when only limited parallel data is available, with ablation studies indicating positive contributions from each training stage.

Significance. If the reported gains can be isolated from differences in training data and evaluation conditions, the work would provide a practical template for bootstrapping code translation support for emerging low-resource languages. The emphasis on staged knowledge injection plus retrieval-augmented repair is a reasonable response to the scarcity of parallel corpora and language-specific knowledge.

major comments (2)

[Abstract / Experimental results] Abstract and experimental results section: the 6.06% functional-equivalence gain is presented as evidence that the multi-stage framework outperforms SOTA under limited parallel data, yet the manuscript does not explicitly state that the cited baselines were retrained or re-evaluated on the identical limited corpus together with the syntactic-knowledge, monolingual-instruction, and error-repair datasets constructed for this work. Without that confirmation, the improvement cannot be attributed to the proposed method rather than to an uneven data regime.
[Experimental results] Experimental setup (implied by the abstract's quantitative claim): no information is supplied on baseline implementations, data splits, number of runs, statistical significance tests, or the precise definition and measurement protocol for 'functional equivalence.' These omissions make it impossible to assess whether the reported margin is robust or reproducible.

minor comments (1)

[Abstract] The abstract refers to 'each training stage' contributing positively but provides no quantitative deltas or intermediate metrics; a table or figure summarizing stage-wise performance would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below with clarifications and commit to revisions that strengthen the manuscript's transparency and reproducibility.

read point-by-point responses

Referee: [Abstract / Experimental results] Abstract and experimental results section: the 6.06% functional-equivalence gain is presented as evidence that the multi-stage framework outperforms SOTA under limited parallel data, yet the manuscript does not explicitly state that the cited baselines were retrained or re-evaluated on the identical limited corpus together with the syntactic-knowledge, monolingual-instruction, and error-repair datasets constructed for this work. Without that confirmation, the improvement cannot be attributed to the proposed method rather than to an uneven data regime.

Authors: We agree that explicit confirmation is required to isolate the contribution of our method. In the experiments, the SOTA baselines were re-evaluated on the identical limited parallel corpus; the syntactic-knowledge and monolingual-instruction datasets were incorporated into baseline fine-tuning where they could be applied without altering the original methods, while the Cangjie-specific error-repair repository was used only by our approach. We will revise the abstract and experimental-results section to state this re-evaluation protocol explicitly, including how each baseline was adapted to the low-resource setting. revision: yes
Referee: [Experimental results] Experimental setup (implied by the abstract's quantitative claim): no information is supplied on baseline implementations, data splits, number of runs, statistical significance tests, or the precise definition and measurement protocol for 'functional equivalence.' These omissions make it impossible to assess whether the reported margin is robust or reproducible.

Authors: We acknowledge the need for these details. In the revised manuscript we will add: (1) baseline implementation descriptions (models, fine-tuning hyperparameters, and any adaptations), (2) data-split ratios for the limited parallel corpus, (3) number of runs (five independent runs with reported means and standard deviations), (4) statistical significance testing (paired t-tests on functional-equivalence scores), and (5) the functional-equivalence protocol (compilation success plus passage of manually verified unit tests that check semantic equivalence). These additions will appear in a new or expanded experimental-setup subsection. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from constructed datasets and external compiler feedback

full rationale

The paper presents an empirical multi-stage LLM training framework for Java-to-Cangjie translation. It constructs syntactic knowledge datasets, monolingual instruction data, and a Cangjie error repair repository, then trains LLMs iteratively while using compiler feedback for error repair during inference. The central claim of a 6.06% functional equivalence improvement is an experimental outcome from ablation studies and comparisons to SOTA baselines, not a mathematical derivation, fitted parameter, or self-referential definition. No equations, uniqueness theorems, or load-bearing self-citations appear in the provided text. The approach depends on external elements (compiler, constructed resources) rather than reducing any result to its own inputs by construction, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about staged LLM training effectiveness and the utility of compiler feedback for low-resource code translation; no free parameters or invented entities are introduced.

axioms (2)

domain assumption LLMs can progressively integrate syntactic, semantic, and structural knowledge through multi-stage training on constructed datasets
Invoked to justify the training framework that gradually achieves alignment.
domain assumption Compiler error messages combined with retrieval of prior repair cases can reliably correct syntactic and semantic errors in generated Cangjie code
Central premise of the iterative error repair component.

pith-pipeline@v0.9.0 · 5570 in / 1310 out tokens · 24470 ms · 2026-05-11T01:56:03.799199+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 4 internal anchors

[1]

Nguyen, Hui Song, and Franck Chauvel

Sindre Grønstøl Haugeland, Phu H. Nguyen, Hui Song, and Franck Chauvel

work page
[2]

In2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

Migrating Monoliths to Microservices-based Customizable Multi-tenant Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair Cloud-native Apps. In2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 170–177. doi:10.1109/SEAA53835.2021.00030

work page doi:10.1109/seaa53835.2021.00030 2021
[3]

Rahul Krishna, Anup Kalia, Saurabh Sinha, Rachel Tzoref-Brill, John Rofrano, and Jin Xiao. 2022. Transforming monolithic applications to microservices with Mono2Micro. InProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering(Melbourne, Australia)(ASE ’21). IEEE Press, 3. doi:10.1109/ASE51524.2021.9678851

work page doi:10.1109/ase51524.2021.9678851 2022
[4]

Yamina Romani, Okba Tibermacine, and Chouki Tibermacine. 2022. Towards Migrating Legacy Software Systems to Microservice-based Architectures: a Data- Centric Process for Microservice Identification. In2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C). 15–19. doi:10.1109/ ICSA-C54293.2022.00010

work page arXiv 2022
[5]

Abhinav Chunchu. 2025. Generative AI-Driven Legacy System Modernization: Transforming Enterprise Infrastructure Through Automated Code Translation and Refactoring.Journal of Computer Science and Technology Studies7, 6 (2025), 407–414

work page 2025
[6]

Qingxiao Tao, Tingrui Yu, Xiaodong Gu, and Beijun Shen. 2024. Unraveling the Potential of Large Language Models in Code Translation: How Far are We?. In 2024 31st Asia-Pacific Software Engineering Conference (APSEC). 353–362. doi:10. 1109/APSEC65559.2024.00046

work page arXiv 2024
[7]

2017.c2rust

Andrei Homescu Khyber Sen. 2017.c2rust. Retrieved January 15, 2026 from https://github.com/immunant/c2rust

work page 2017
[8]

2021.cxgo

Denys Smirnov. 2021.cxgo. Retrieved January 15, 2026 from https://github.com/ gotranspile/cxgo

work page 2021
[9]

2016.JavaToCSharp

Maximilien Noal Paul Irwin, Vahid Nasiri. 2016.JavaToCSharp. Retrieved January 15, 2026 from https://github.com/paulirwin/JavaToCSharp

work page 2016
[10]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2013. Lexical statistical machine translation for language migration. InProceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering(Saint Petersburg, Russia)(ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 651–654. doi:10.1145/2491411.2494584

work page doi:10.1145/2491411.2494584 2013
[11]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2015. Divide- and-conquer approach for multi-phase statistical migration for source code. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering(Lincoln, Nebraska)(ASE ’15). IEEE Press, 585–596. doi:10.1109/ASE. 2015.74

work page doi:10.1109/ase 2015
[12]

Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 2552–2562

work page 2018
[13]

Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, and Gabriel Synnaeve. 2022. Code translation with compiler represen- tations.arXiv preprint arXiv:2207.03578(2022)

work page arXiv 2022
[14]

Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, and Wen Wang. 2023. Code- TransOcean: A Comprehensive Multilingual Benchmark for Code Translation. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Lin- guistics, Singapore, 5067–5089. doi:10.18653/v1/202...

work page doi:10.18653/v1/2023.findings-emnlp.337 2023
[16]

Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lam- bert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2024. Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code. InProceedings of the IEEE/ACM 46th International Conference on Software Enginee...

work page doi:10.1145/3597503.3639226 2024
[17]

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation.Proc. ACM Softw. Eng.1, FSE, Article 71 (July 2024), 24 pages. doi:10.1145/3660778

work page doi:10.1145/3660778 2024
[18]

Zhiqiang Yuan, Weitong Chen, Hanlin Wang, Kai Yu, Xin Peng, and Yiling Lou

work page
[19]

Transagent: An llm-based multi-agent system for code translation.arXiv preprint arXiv:2409.19894(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, and Arjun Guha. 2024. Knowledge Transfer from High-Resource to Low- Resource Programming Languages for Code LLMs.Proc. ACM Program. Lang.8, OOPSLA2, Article 295 (Oct. 2024), 32 pages. doi:10.114...

work page doi:10.1145/3689735 2024
[21]

Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lam- ple. 2020. Unsupervised translation of programming languages. InProceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada)(NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 1730, 11 pages

work page 2020
[22]

Baptiste Roziere, Jie M Zhang, Francois Charton, Mark Harman, Gabriel Syn- naeve, and Guillaume Lample. 2021. Leveraging automated unit tests for unsu- pervised code translation.arXiv preprint arXiv:2110.06773(2021)

work page arXiv 2021
[23]

Ming Zhu, Mohimenul Karim, Ismini Lourentzou, and Daphne Yao. 2024. Semi- Supervised Code Translation Overcoming the Scarcity of Parallel Code Data. In Proceedings of the 39th IEEE/ACM International Conference on Automated Soft- ware Engineering(Sacramento, CA, USA)(ASE ’24). Association for Computing Machinery, New York, NY, USA, 1545–1556. doi:10.1145/3...

work page doi:10.1145/3691620.3695524 2024
[24]

Jun Wang, Chenghao Su, Yijie Ou, Yanhui Li, Jialiang Tan, Lin Chen, and Yuming Zhou. 2025. Translating to a Low-Resource Language with Compiler Feedback: A Case Study on Cangjie.IEEE Transactions on Software Engineering51, 9 (2025), 2671–2692. doi:10.1109/TSE.2025.3594908

work page doi:10.1109/tse.2025.3594908 2025
[25]

Fang Liu, Jia Li, and Li Zhang. 2023. Syntax and Domain Aware Model for Unsu- pervised Program Translation. InProceedings of the 45th International Conference on Software Engineering(Melbourne, Victoria, Australia)(ICSE ’23). IEEE Press, 755–767. doi:10.1109/ICSE48619.2023.00072

work page doi:10.1109/icse48619.2023.00072 2023
[26]

Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, and Hao Wang. 2025. Continual Learning of Large Language Models: A Comprehensive Survey.ACM Comput. Surv.58, 5, Article 120 (Nov. 2025), 42 pages. doi:10.1145/3735633

work page doi:10.1145/3735633 2025
[27]

Prithwish Jana, Piyush Jha, Haoyang Ju, Gautham Kishore, Aryan Mahajan, and Vijay Ganesh. 2023. Cotran: An llm-based code translator using reinforcement learning with feedback from compiler and symbolic execution.Proceedings of the 27th European Conference on Artificial Intelligence392 (2023), 4011–4018. doi:10.3233/FAIA240968

work page doi:10.3233/faia240968 2023
[28]

Qi Xin. 2017. Towards addressing the patch overfitting problem. InProceedings of the 39th International Conference on Software Engineering Companion(Buenos Aires, Argentina)(ICSE-C ’17). IEEE Press, 489–490. doi:10.1109/ICSE-C.2017.42

work page doi:10.1109/icse-c.2017.42 2017
[29]

Barr, and Sergey Mechtaev

Nikhil Parasaram, Earl T. Barr, and Sergey Mechtaev. 2022. Trident: Control- ling Side Effects in Automated Program Repair.IEEE Transactions on Software Engineering48, 12 (2022), 4717–4732. doi:10.1109/TSE.2021.3124323

work page doi:10.1109/tse.2021.3124323 2022
[30]

Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn Rose. 2023. Data Augmen- tation for Code Translation with Comparable Corpora and Multiple References. InThe 2023 Conference on Empirical Methods in Natural Language Processing. https://openreview.net/forum?id=8NA76tz7Jj

work page 2023
[31]

Kyle Wong, Alfonso Amayuelas, Liangming Pan, and William Yang Wang. 2025. Investigating the transferability of code repair for low-resource programming languages. InFindings of the Association for Computational Linguistics: NAACL

work page 2025
[32]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3

work page 2022
[33]

Varshney

Razan Baltaji, Saurabh Pujar, Martin Hirzel, Louis Mandel, Luca Buratti, and Lav R. Varshney. 2025. Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study.Transactions on Machine Learning Research(2025). https://openreview.net/forum?id=1PRBHKgQVM

work page 2025
[34]

Rajarshi Haldar and Julia Hockenmaier. 2024. Analyzing the performance of large language models on code summarization.arXiv preprint arXiv:2404.08018 (2024)

work page arXiv 2024
[35]

2013.tree-sitter

Andrew Hlynskyi Max Brunsfeld, Amaan Qureshi. 2013.tree-sitter. Retrieved January 15, 2026 from https://github.com/tree-sitter/tree-sitter

work page 2013
[36]

Jianbo Lin, Yi Shen, Chuanyi Li, Changan Niu, and Bin Luo. 2025. OptCode- Trans: Boost LLMs on Low-Resource Programming Language Translation.2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)(2025), 67–72. https://api.semanticscholar.org/CorpusID: 279800367

work page 2025
[37]

Qwen Team et al. 2024. Qwen2 technical report.arXiv preprint arXiv:2407.10671 2, 3 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy- Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. 2024. Starcoder 2 and the stack v2: The next generation.arXiv preprint arXiv:2402.19173(2024)

work page internal anchor Pith review arXiv 2024
[39]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. InProceedings of the 40th Annual Meeting on Association for Computational Linguistics(Philadel- phia, Pennsylvania)(ACL ’02). Association for Computational Linguistics, USA, 311–318. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002
[40]

Min Xue, Artur Andrzejak, and Marla Leuther. 2024. An interpretable error correction method for enhancing code-to-code translation. InThe Twelfth Inter- national Conference on Learning Representations. https://openreview.net/forum? id=fVxIEHGnVT

work page 2024
[41]

Pengyu Xue, Linhao Wu, Zhen Yang, Chengyi Wang, Xiang Li, Yuxiang Zhang, Jia Li, Ruikai Jin, Yifei Pei, Zhaoyan Shen, et al. 2025. ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 1421–1444

work page 2025
[42]

Bairi, A

Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D. C., Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, B. Ashok, and Shashank Shet. 2024. CodePlan: Repository-Level Coding using LLMs and Planning.Proc. ACM Softw. Xinyue Liang, Jingxuan Zhang, Lin Li, Jun Zhang, and Junhao Chen Eng.1, FSE, Article 31 (July 2024), 24 pages. doi:10.1145/3643757

work page doi:10.1145/3643757 2024
[43]

Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-Based Statistical Translation of Programming Languages. InProceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software(Portland, Oregon, USA)(Onward! 2014). Association for Computing Machinery, New York, NY, USA, 173–184. do...

work page doi:10.1145/2661136 2014
[44]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2014. Migrating code with statistical machine translation(ICSE Companion 2014). Association for Computing Machinery, New York, NY, USA, 544–547. doi:10.1145/2591062. 2591072

work page doi:10.1145/2591062 2014
[45]

Tien N. Nguyen. 2016. Code migration with statistical machine translation (SoftwareMining 2016). Association for Computing Machinery, New York, NY, USA, 2. doi:10.1145/2975961.2990477

work page doi:10.1145/2975961.2990477 2016
[46]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- chine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473(2014)

work page internal anchor Pith review arXiv 2014
[47]

Alessandro Giagnorio, Alberto Martin-Lopez, and Gabriele Bavota. 2025. En- hancing Code Generation for Low-Resource Languages: No Silver Bullet . In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). IEEE Computer Society, Los Alamitos, CA, USA, 478–488. doi:10.1109/ICPC66645. 2025.00058

work page doi:10.1109/icpc66645 2025
[48]

Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New Y...

work page doi:10.1145/3540250.3549101 2022
[49]

Bo Zhou, Jiaqi Shi, Ying Wang, Li Li, Tsz On Li, Hai Yu, and Zhiliang Zhu. 2025. Porting Software Libraries to OpenHarmony: Transitioning from TypeScript or JavaScript to ArkTS.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA064 (June 2025), 22 pages. doi:10.1145/3728941

work page doi:10.1145/3728941 2025
[50]

Jaemin Hong and Sukyoung Ryu. 2025. Type-migrating C-to-Rust translation using a large language model.Empirical Software Engineering30, 1 (2025), 3

work page 2025