pith. machine review for the scientific record. sign in

arxiv: 2605.07403 · v1 · submitted 2026-05-08 · 💻 cs.SE

Recognition: no theorem link

Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:56 UTC · model grok-4.3

classification 💻 cs.SE
keywords code translationJava to CangjieLLM trainingerror repairlow-resource languagemulti-stage learningfunctional equivalencecompiler feedback
0
0 comments X

The pith

A multi-stage training framework lets LLMs translate Java to Cangjie more effectively with scarce data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that general large language models can be trained in stages to handle translation from Java into the low-resource Cangjie language by first absorbing syntactic knowledge and monolingual instructions before applying iterative error repair. This addresses the problems of missing language-specific knowledge and few parallel code examples that cause standard models to produce invalid or misaligned code. If the claim holds, it would mean developers can more readily migrate codebases to new programming languages without needing massive amounts of translated examples for training. The reported experiments support this by showing gains in functional equivalence even when parallel data is limited.

Core claim

Through a multi-stage LLM training framework that uses syntactic knowledge datasets and monolingual instruction data followed by error repair with a dedicated Cangjie repository and compiler feedback, the approach achieves semantic alignment and structure awareness in Java-to-Cangjie translations, yielding a 6.06% improvement in functional equivalence over state-of-the-art methods.

What carries the argument

The multi-stage training framework with iterative error repair using compiler feedback and case retrieval.

If this is right

  • Each of the training stages contributes positively to the final translation performance.
  • The combination of compiler feedback and error repair case retrieval effectively fixes incorrect Cangjie code.
  • The method works with limited parallel data where standard fine-tuning approaches struggle.
  • Functional equivalence and compilability improve compared to existing translation techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This staged knowledge injection approach could apply to code translation involving other emerging programming languages.
  • Building larger error repair repositories from real usage data might yield further gains in repair success.
  • The reliance on monolingual data suggests a path for improving LLM performance on tasks with asymmetric data availability.

Load-bearing premise

The specially built syntactic knowledge datasets, monolingual instruction data, and error repair repository contain enough relevant information to teach the LLM reliable Cangjie syntax and semantics despite the absence of large parallel corpora.

What would settle it

Applying the full approach and a single-stage baseline to a fresh collection of Java programs and finding that the functional equivalence improvement falls below 3% or disappears entirely.

Figures

Figures reproduced from arXiv: 2605.07403 by Jingxuan Zhang, Junhao Chen, Jun Zhang, Lin Li, Xinyue Liang.

Figure 1
Figure 1. Figure 1: Comparisons Between LLM-Generated and Refer [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall Framework of the Proposed Java-to-Cangjie Translation Pipeline. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cangjie Documentation Reconstruction Prompt. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: AST-Conditioned Prompt Template for Structure [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Prompt Templates for LLM-based Self-Analytic Re [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: An Example of the error repair case in the error [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Code Translation Performance across Backbone [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

With the rapid evolution of emerging programming language ecosystems, the demand for code translation to low-resource languages continues to grow. As Cangjie emerges as a new programming language, its ecosystem and development toolchains are rapidly expanding. Automated translation from popular programming languages to Cangjie is therefore valuable for practical development. However, constrained by both insufficient Cangjie knowledge and scarce parallel code corpora, general Large Language Models (LLMs) are prone to syntactic errors and semantic as well as structural misalignment in code translation. Existing approaches typically rely on fine-tuning with large-scale parallel data, but they cannot reliably improve compilability or semantic consistency for low-resource Cangjie languages. To tackle these challenges, we propose a multi-stage training framework of LLMs that employs the iterative error repair technique to translate Java code into Cangjie code. This training framework performs training on LLMs, gradually integrating knowledge and achieving semantic alignment as well as structure awareness. During the code translation, we also combine the compiler feedback and error repair case retrieval to repair the incorrect Cangjie code. We construct syntactic knowledge and monolingual instruction datasets to train the LLM. In addition, we also build a Cangjie error repair repository to support error repair in our approach. Experimental results show that, with limited parallel data, our approach improves functional equivalence by 6.06\% compared to the state-of-the-art approaches. Meanwhile, ablation studies confirm that each training stage positively contributes to the final performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a multi-stage LLM training framework for Java-to-Cangjie code translation that integrates syntactic knowledge and monolingual instruction datasets during training, followed by iterative error repair that combines compiler feedback with retrieval from a constructed Cangjie error repair repository. The central empirical claim is that this approach achieves a 6.06% improvement in functional equivalence over state-of-the-art methods when only limited parallel data is available, with ablation studies indicating positive contributions from each training stage.

Significance. If the reported gains can be isolated from differences in training data and evaluation conditions, the work would provide a practical template for bootstrapping code translation support for emerging low-resource languages. The emphasis on staged knowledge injection plus retrieval-augmented repair is a reasonable response to the scarcity of parallel corpora and language-specific knowledge.

major comments (2)
  1. [Abstract / Experimental results] Abstract and experimental results section: the 6.06% functional-equivalence gain is presented as evidence that the multi-stage framework outperforms SOTA under limited parallel data, yet the manuscript does not explicitly state that the cited baselines were retrained or re-evaluated on the identical limited corpus together with the syntactic-knowledge, monolingual-instruction, and error-repair datasets constructed for this work. Without that confirmation, the improvement cannot be attributed to the proposed method rather than to an uneven data regime.
  2. [Experimental results] Experimental setup (implied by the abstract's quantitative claim): no information is supplied on baseline implementations, data splits, number of runs, statistical significance tests, or the precise definition and measurement protocol for 'functional equivalence.' These omissions make it impossible to assess whether the reported margin is robust or reproducible.
minor comments (1)
  1. [Abstract] The abstract refers to 'each training stage' contributing positively but provides no quantitative deltas or intermediate metrics; a table or figure summarizing stage-wise performance would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below with clarifications and commit to revisions that strengthen the manuscript's transparency and reproducibility.

read point-by-point responses
  1. Referee: [Abstract / Experimental results] Abstract and experimental results section: the 6.06% functional-equivalence gain is presented as evidence that the multi-stage framework outperforms SOTA under limited parallel data, yet the manuscript does not explicitly state that the cited baselines were retrained or re-evaluated on the identical limited corpus together with the syntactic-knowledge, monolingual-instruction, and error-repair datasets constructed for this work. Without that confirmation, the improvement cannot be attributed to the proposed method rather than to an uneven data regime.

    Authors: We agree that explicit confirmation is required to isolate the contribution of our method. In the experiments, the SOTA baselines were re-evaluated on the identical limited parallel corpus; the syntactic-knowledge and monolingual-instruction datasets were incorporated into baseline fine-tuning where they could be applied without altering the original methods, while the Cangjie-specific error-repair repository was used only by our approach. We will revise the abstract and experimental-results section to state this re-evaluation protocol explicitly, including how each baseline was adapted to the low-resource setting. revision: yes

  2. Referee: [Experimental results] Experimental setup (implied by the abstract's quantitative claim): no information is supplied on baseline implementations, data splits, number of runs, statistical significance tests, or the precise definition and measurement protocol for 'functional equivalence.' These omissions make it impossible to assess whether the reported margin is robust or reproducible.

    Authors: We acknowledge the need for these details. In the revised manuscript we will add: (1) baseline implementation descriptions (models, fine-tuning hyperparameters, and any adaptations), (2) data-split ratios for the limited parallel corpus, (3) number of runs (five independent runs with reported means and standard deviations), (4) statistical significance testing (paired t-tests on functional-equivalence scores), and (5) the functional-equivalence protocol (compilation success plus passage of manually verified unit tests that check semantic equivalence). These additions will appear in a new or expanded experimental-setup subsection. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from constructed datasets and external compiler feedback

full rationale

The paper presents an empirical multi-stage LLM training framework for Java-to-Cangjie translation. It constructs syntactic knowledge datasets, monolingual instruction data, and a Cangjie error repair repository, then trains LLMs iteratively while using compiler feedback for error repair during inference. The central claim of a 6.06% functional equivalence improvement is an experimental outcome from ablation studies and comparisons to SOTA baselines, not a mathematical derivation, fitted parameter, or self-referential definition. No equations, uniqueness theorems, or load-bearing self-citations appear in the provided text. The approach depends on external elements (compiler, constructed resources) rather than reducing any result to its own inputs by construction, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about staged LLM training effectiveness and the utility of compiler feedback for low-resource code translation; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption LLMs can progressively integrate syntactic, semantic, and structural knowledge through multi-stage training on constructed datasets
    Invoked to justify the training framework that gradually achieves alignment.
  • domain assumption Compiler error messages combined with retrieval of prior repair cases can reliably correct syntactic and semantic errors in generated Cangjie code
    Central premise of the iterative error repair component.

pith-pipeline@v0.9.0 · 5570 in / 1310 out tokens · 24470 ms · 2026-05-11T01:56:03.799199+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 4 internal anchors

  1. [1]

    Nguyen, Hui Song, and Franck Chauvel

    Sindre Grønstøl Haugeland, Phu H. Nguyen, Hui Song, and Franck Chauvel

  2. [2]

    In2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

    Migrating Monoliths to Microservices-based Customizable Multi-tenant Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair Cloud-native Apps. In2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 170–177. doi:10.1109/SEAA53835.2021.00030

  3. [3]

    Rahul Krishna, Anup Kalia, Saurabh Sinha, Rachel Tzoref-Brill, John Rofrano, and Jin Xiao. 2022. Transforming monolithic applications to microservices with Mono2Micro. InProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering(Melbourne, Australia)(ASE ’21). IEEE Press, 3. doi:10.1109/ASE51524.2021.9678851

  4. [4]

    Yamina Romani, Okba Tibermacine, and Chouki Tibermacine. 2022. Towards Migrating Legacy Software Systems to Microservice-based Architectures: a Data- Centric Process for Microservice Identification. In2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C). 15–19. doi:10.1109/ ICSA-C54293.2022.00010

  5. [5]

    Abhinav Chunchu. 2025. Generative AI-Driven Legacy System Modernization: Transforming Enterprise Infrastructure Through Automated Code Translation and Refactoring.Journal of Computer Science and Technology Studies7, 6 (2025), 407–414

  6. [6]

    Qingxiao Tao, Tingrui Yu, Xiaodong Gu, and Beijun Shen. 2024. Unraveling the Potential of Large Language Models in Code Translation: How Far are We?. In 2024 31st Asia-Pacific Software Engineering Conference (APSEC). 353–362. doi:10. 1109/APSEC65559.2024.00046

  7. [7]

    2017.c2rust

    Andrei Homescu Khyber Sen. 2017.c2rust. Retrieved January 15, 2026 from https://github.com/immunant/c2rust

  8. [8]

    2021.cxgo

    Denys Smirnov. 2021.cxgo. Retrieved January 15, 2026 from https://github.com/ gotranspile/cxgo

  9. [9]

    2016.JavaToCSharp

    Maximilien Noal Paul Irwin, Vahid Nasiri. 2016.JavaToCSharp. Retrieved January 15, 2026 from https://github.com/paulirwin/JavaToCSharp

  10. [10]

    Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2013. Lexical statistical machine translation for language migration. InProceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering(Saint Petersburg, Russia)(ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 651–654. doi:10.1145/2491411.2494584

  11. [11]

    Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2015. Divide- and-conquer approach for multi-phase statistical migration for source code. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering(Lincoln, Nebraska)(ASE ’15). IEEE Press, 585–596. doi:10.1109/ASE. 2015.74

  12. [12]

    Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 2552–2562

  13. [13]

    Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, and Gabriel Synnaeve. 2022. Code translation with compiler represen- tations.arXiv preprint arXiv:2207.03578(2022)

  14. [14]

    Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, and Wen Wang. 2023. Code- TransOcean: A Comprehensive Multilingual Benchmark for Code Translation. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Lin- guistics, Singapore, 5067–5089. doi:10.18653/v1/202...

  15. [16]

    Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lam- bert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2024. Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code. InProceedings of the IEEE/ACM 46th International Conference on Software Enginee...

  16. [17]

    Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation.Proc. ACM Softw. Eng.1, FSE, Article 71 (July 2024), 24 pages. doi:10.1145/3660778

  17. [18]

    Zhiqiang Yuan, Weitong Chen, Hanlin Wang, Kai Yu, Xin Peng, and Yiling Lou

  18. [19]

    Transagent: An llm-based multi-agent system for code translation.arXiv preprint arXiv:2409.19894(2024)

  19. [20]

    Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, and Arjun Guha. 2024. Knowledge Transfer from High-Resource to Low- Resource Programming Languages for Code LLMs.Proc. ACM Program. Lang.8, OOPSLA2, Article 295 (Oct. 2024), 32 pages. doi:10.114...

  20. [21]

    Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lam- ple. 2020. Unsupervised translation of programming languages. InProceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada)(NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 1730, 11 pages

  21. [22]

    Baptiste Roziere, Jie M Zhang, Francois Charton, Mark Harman, Gabriel Syn- naeve, and Guillaume Lample. 2021. Leveraging automated unit tests for unsu- pervised code translation.arXiv preprint arXiv:2110.06773(2021)

  22. [23]

    Ming Zhu, Mohimenul Karim, Ismini Lourentzou, and Daphne Yao. 2024. Semi- Supervised Code Translation Overcoming the Scarcity of Parallel Code Data. In Proceedings of the 39th IEEE/ACM International Conference on Automated Soft- ware Engineering(Sacramento, CA, USA)(ASE ’24). Association for Computing Machinery, New York, NY, USA, 1545–1556. doi:10.1145/3...

  23. [24]

    Jun Wang, Chenghao Su, Yijie Ou, Yanhui Li, Jialiang Tan, Lin Chen, and Yuming Zhou. 2025. Translating to a Low-Resource Language with Compiler Feedback: A Case Study on Cangjie.IEEE Transactions on Software Engineering51, 9 (2025), 2671–2692. doi:10.1109/TSE.2025.3594908

  24. [25]

    Fang Liu, Jia Li, and Li Zhang. 2023. Syntax and Domain Aware Model for Unsu- pervised Program Translation. InProceedings of the 45th International Conference on Software Engineering(Melbourne, Victoria, Australia)(ICSE ’23). IEEE Press, 755–767. doi:10.1109/ICSE48619.2023.00072

  25. [26]

    Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, and Hao Wang. 2025. Continual Learning of Large Language Models: A Comprehensive Survey.ACM Comput. Surv.58, 5, Article 120 (Nov. 2025), 42 pages. doi:10.1145/3735633

  26. [27]

    Prithwish Jana, Piyush Jha, Haoyang Ju, Gautham Kishore, Aryan Mahajan, and Vijay Ganesh. 2023. Cotran: An llm-based code translator using reinforcement learning with feedback from compiler and symbolic execution.Proceedings of the 27th European Conference on Artificial Intelligence392 (2023), 4011–4018. doi:10.3233/FAIA240968

  27. [28]

    Qi Xin. 2017. Towards addressing the patch overfitting problem. InProceedings of the 39th International Conference on Software Engineering Companion(Buenos Aires, Argentina)(ICSE-C ’17). IEEE Press, 489–490. doi:10.1109/ICSE-C.2017.42

  28. [29]

    Barr, and Sergey Mechtaev

    Nikhil Parasaram, Earl T. Barr, and Sergey Mechtaev. 2022. Trident: Control- ling Side Effects in Automated Program Repair.IEEE Transactions on Software Engineering48, 12 (2022), 4717–4732. doi:10.1109/TSE.2021.3124323

  29. [30]

    Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn Rose. 2023. Data Augmen- tation for Code Translation with Comparable Corpora and Multiple References. InThe 2023 Conference on Empirical Methods in Natural Language Processing. https://openreview.net/forum?id=8NA76tz7Jj

  30. [31]

    Kyle Wong, Alfonso Amayuelas, Liangming Pan, and William Yang Wang. 2025. Investigating the transferability of code repair for low-resource programming languages. InFindings of the Association for Computational Linguistics: NAACL

  31. [32]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3

  32. [33]

    Varshney

    Razan Baltaji, Saurabh Pujar, Martin Hirzel, Louis Mandel, Luca Buratti, and Lav R. Varshney. 2025. Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study.Transactions on Machine Learning Research(2025). https://openreview.net/forum?id=1PRBHKgQVM

  33. [34]

    Rajarshi Haldar and Julia Hockenmaier. 2024. Analyzing the performance of large language models on code summarization.arXiv preprint arXiv:2404.08018 (2024)

  34. [35]

    2013.tree-sitter

    Andrew Hlynskyi Max Brunsfeld, Amaan Qureshi. 2013.tree-sitter. Retrieved January 15, 2026 from https://github.com/tree-sitter/tree-sitter

  35. [36]

    Jianbo Lin, Yi Shen, Chuanyi Li, Changan Niu, and Bin Luo. 2025. OptCode- Trans: Boost LLMs on Low-Resource Programming Language Translation.2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)(2025), 67–72. https://api.semanticscholar.org/CorpusID: 279800367

  36. [37]

    Qwen Team et al. 2024. Qwen2 technical report.arXiv preprint arXiv:2407.10671 2, 3 (2024)

  37. [38]

    Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy- Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. 2024. Starcoder 2 and the stack v2: The next generation.arXiv preprint arXiv:2402.19173(2024)

  38. [39]

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. InProceedings of the 40th Annual Meeting on Association for Computational Linguistics(Philadel- phia, Pennsylvania)(ACL ’02). Association for Computational Linguistics, USA, 311–318. doi:10.3115/1073083.1073135

  39. [40]

    Min Xue, Artur Andrzejak, and Marla Leuther. 2024. An interpretable error correction method for enhancing code-to-code translation. InThe Twelfth Inter- national Conference on Learning Representations. https://openreview.net/forum? id=fVxIEHGnVT

  40. [41]

    Pengyu Xue, Linhao Wu, Zhen Yang, Chengyi Wang, Xiang Li, Yuxiang Zhang, Jia Li, Ruikai Jin, Yifei Pei, Zhaoyan Shen, et al. 2025. ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 1421–1444

  41. [42]

    Bairi, A

    Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D. C., Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, B. Ashok, and Shashank Shet. 2024. CodePlan: Repository-Level Coding using LLMs and Planning.Proc. ACM Softw. Xinyue Liang, Jingxuan Zhang, Lin Li, Jun Zhang, and Junhao Chen Eng.1, FSE, Article 31 (July 2024), 24 pages. doi:10.1145/3643757

  42. [43]

    Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-Based Statistical Translation of Programming Languages. InProceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software(Portland, Oregon, USA)(Onward! 2014). Association for Computing Machinery, New York, NY, USA, 173–184. do...

  43. [44]

    Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2014. Migrating code with statistical machine translation(ICSE Companion 2014). Association for Computing Machinery, New York, NY, USA, 544–547. doi:10.1145/2591062. 2591072

  44. [45]

    Tien N. Nguyen. 2016. Code migration with statistical machine translation (SoftwareMining 2016). Association for Computing Machinery, New York, NY, USA, 2. doi:10.1145/2975961.2990477

  45. [46]

    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- chine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473(2014)

  46. [47]

    Alessandro Giagnorio, Alberto Martin-Lopez, and Gabriele Bavota. 2025. En- hancing Code Generation for Low-Resource Languages: No Silver Bullet . In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). IEEE Computer Society, Los Alamitos, CA, USA, 478–488. doi:10.1109/ICPC66645. 2025.00058

  47. [48]

    Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New Y...

  48. [49]

    Bo Zhou, Jiaqi Shi, Ying Wang, Li Li, Tsz On Li, Hai Yu, and Zhiliang Zhu. 2025. Porting Software Libraries to OpenHarmony: Transitioning from TypeScript or JavaScript to ArkTS.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA064 (June 2025), 22 pages. doi:10.1145/3728941

  49. [50]

    Jaemin Hong and Sukyoung Ryu. 2025. Type-migrating C-to-Rust translation using a large language model.Empirical Software Engineering30, 1 (2025), 3