pith. the verified trust layer for science. sign in

arxiv: 2510.10956 · v3 · submitted 2025-10-13 · 💻 cs.SE · cs.AI

Project-Level C-to-Rust Translation via Pointer Knowledge Graphs

Pith reviewed 2026-05-18 08:22 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords C-to-Rust translationpointer knowledge graphlarge language modelsmemory safetyproject-level translationunsafe code reductionpointer semantics
0
0 comments X p. Extension

The pith

A pointer knowledge graph gives LLMs the global view needed to translate entire C projects into safe Rust.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a Pointer Knowledge Graph that records how pointers flow through a full C project and lifts struct interactions to clearer abstractions. It also records Rust-specific details such as ownership, mutability, nullability, and lifetime constraints. When this graph is supplied to large language models, they can translate the project as a whole instead of function by function, avoiding the pointer errors that arise from missing context. A sympathetic reader would care because the result is Rust code that stays functionally correct while eliminating nearly all unsafe blocks.

Core claim

The authors claim that a Pointer Knowledge Graph, formed by adding pointer usage flows and Rust-oriented annotations to standard dependency graphs, supplies LLMs with enough global semantics to produce project-level C-to-Rust translations that are both functionally correct and almost free of unsafe constructs.

What carries the argument

The Pointer Knowledge Graph, which augments code dependency graphs with points-to flows, lifted struct interactions, and annotations for ownership, mutability, nullability, and lifetime.

If this is right

  • Project-level translations maintain dependencies across the entire codebase rather than isolating functions.
  • Generated Rust requires far fewer unsafe blocks than rule-based translators or standard LLM pipelines.
  • Functional correctness on test suites exceeds results from fuzzing-enhanced LLM baselines.
  • Pointer-related errors are reduced because the model sees usage patterns and lifetime constraints at once.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph structure could support translation between other pairs of languages that differ in memory safety rules.
  • Incremental updates to the graph might allow ongoing translation as a C project evolves over time.
  • Similar semantic graphs could address other cross-language issues such as concurrency or error handling.

Load-bearing premise

The pointer knowledge graph can be constructed accurately from the C source and supplies LLMs with enough global pointer information to generate functionally correct Rust without further manual fixes.

What would settle it

Running the method on a collection of C projects and finding that the output Rust still contains high rates of unsafe code or fails functional tests at rates comparable to earlier approaches would show the graph does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2510.10956 by Chong Wang, Wenjun Mao, Xin Peng, Xiyue Shang, Yiling Lou, Zhiqiang Yuan, Zhuo Chen.

Figure 1
Figure 1. Figure 1: Motivating Example PTRMAPPER translation performance. II. RELATED WORK Code translation has been extensively studied in existing literature[28], [29], [30], [31], [32], [33], [34]. Given the substantial differences among programming languages, code translation techniques are typically tailored to specific lan￾guage pairs (e.g., C-to-Rust, Java-to-Python, Python-to-C). Therefore, in this section, we focus o… view at source ↗
Figure 2
Figure 2. Figure 2: Workflow of PTRMAPPER the structure pointed to by tree (i.e., quadtree t). After being passed to insert_, tree only reads its key_free member, while root accesses all its members (e.g., point and key). By leveraging this global usage information, LLMs extract key_free from quadtree_t as a separate function parameter, avoiding borrow conflicts and improving both the correctness and safety of the Rust transl… view at source ↗
Figure 3
Figure 3. Figure 3: Definition and Usage Context of string_table_t [59] Struct Downward analysis tracks the parameter’s usage within the current function and in any downstream functions when it is passed as an argument. Upward analysis considers all direct call sites, capturing how the parameter is used after being passed as an actual argument to the current function. • For function return pointers, we combine internal analys… view at source ↗
Figure 4
Figure 4. Figure 4: Code Translation Prompt insert_, the extracted semantic knowledge is expressed as triples such as <tree, pointsTo, quadtree_t> and <tree, isA, Borrowed>. Furthermore, if two pointer pa￾rameters within a function exhibit a derivesFrom relation￾ship and different mutability, PTRMAPPER explicitly provides refactoring guidance of LLMs with the following format. Param 1 derivesFrom Param 2, where Param 1 requir… view at source ↗
Figure 5
Figure 5. Figure 5: Translation Order Based on Dependency Graph. Numbers Indicate the Translation Order. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Error Correction Prompt annotations (e.g., ¡root, isA, Owning¿) from the Rust copy of pointer KG. These elements are incorporated into a correction prompt for the LLM. The LLM resolves the type mismatch by replacing tree.root with tree.root.as_deref(), which preserves the ownership and borrowing semantics of the original program design. In contrast, relying solely on LLMs for repair based on error descript… view at source ↗
Figure 7
Figure 7. Figure 7: Error Correction Example tests for studied projects with the help from automatic gener￾ation tools (e.g., fuzzing and LLMs). To control the manual effort, we focus on C projects with fewer than 4,000 lines of code. As shown in Table II, our constructed tests achieve high coverage with 81.4% - 97.7% line coverage and 92.9% - 100.0% function coverage, ensuring a comprehensive func￾tional equivalence evaluati… view at source ↗
read the original abstract

Translating C code into safe Rust is an effective way to ensure memory safety. Compared to rule-based approaches, which often produce largely unsafe Rust code, LLM-based methods generate more idiomatic and safer Rust by leveraging extensive training on human-written code. Despite their promise, existing LLM-based approaches still struggle with project-level C-to-Rust translation. They typically partition a C project into smaller units (e.g., functions) based on call graphs and translate them in a bottom-up manner to resolve dependencies. However, this unit-by-unit paradigm often fails to handle pointers due to the lack of a global view of their usage. To address this limitation, we propose a novel C-to-Rust Pointer Knowledge Graph (KG) that augments code dependency graphs with two types of pointer semantics: (i) pointer usage information, which captures global behaviors such as points-to flows and lifts low-level struct interactions to higher-level abstractions; and (ii) Rust-oriented annotations, which encode ownership, mutability, nullability, and lifetime. Building on this KG, we further propose PtrTrans, a project-level C-to-Rust translation approach. In PtrTrans, the KG provides LLMs with comprehensive global pointer semantics, guiding them to generate safe and idiomatic Rust code. Experimental results show that PtrTrans reduces unsafe usages in translated Rust by 99.9% compared to both rule-based and conventional LLM-based methods, while achieving 29.3% higher functional correctness than fuzzing-enhanced LLM approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents PtrTrans, a project-level C-to-Rust translation approach that constructs a Pointer Knowledge Graph (KG) to capture global pointer semantics such as points-to flows, struct abstractions, ownership, mutability, nullability, and lifetimes. This KG is used to guide LLMs in generating safe and functionally correct Rust code from C projects. The authors claim that PtrTrans achieves a 99.9% reduction in unsafe Rust usages compared to rule-based and standard LLM methods, and a 29.3% improvement in functional correctness over fuzzing-enhanced LLM approaches.

Significance. If the experimental claims hold and the KG construction proves robust, this work could significantly advance automated translation of legacy C code to memory-safe Rust by providing LLMs with structured global pointer information at the project level. This addresses a key limitation in existing unit-by-unit translation methods. The approach combines static analysis with LLM prompting in a novel way that may reduce manual fixes needed for safety and correctness.

major comments (3)
  1. [Abstract] Abstract: The abstract reports 99.9% reduction in unsafe usages and 29.3% higher functional correctness, but provides no details on the datasets, projects used, baseline implementations, or how statistical significance was determined. This information is essential to evaluate the central experimental claims.
  2. [Section 3] Section 3 (Pointer Knowledge Graph construction): The method for building the KG via static analysis to extract points-to flows, ownership, and lifetimes is not specified in detail, including handling of undecidable C constructs such as void*, unions, and function pointers. Since the central claim depends on the KG supplying accurate global semantics to the LLM, this omission is load-bearing.
  3. [Section 5] Section 5 (Experimental results): The evaluation lacks description of how functional correctness was measured (e.g., test suites or fuzzing protocols) and how unsafe code usages were quantified across projects, making it impossible to verify the reported gains or rule out benchmark-specific artifacts.
minor comments (2)
  1. [Introduction] The paper would benefit from an explicit definition of 'unsafe usages' and 'functional correctness' early in the text to avoid ambiguity in later sections.
  2. [Figure 2] Figure captions for the KG diagrams could include more detail on the node and edge types to improve readability for readers unfamiliar with the specific abstractions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We respond to each major comment below and indicate the revisions we will make to enhance clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract reports 99.9% reduction in unsafe usages and 29.3% higher functional correctness, but provides no details on the datasets, projects used, baseline implementations, or how statistical significance was determined. This information is essential to evaluate the central experimental claims.

    Authors: We agree that the abstract is concise by design and therefore omits granular experimental details. The datasets, specific C projects, baseline implementations, and evaluation methodology (including how statistical significance was assessed) are fully described in Section 5. To improve reader accessibility, we will revise the abstract to include a brief reference to the evaluation benchmarks and projects. revision: yes

  2. Referee: [Section 3] Section 3 (Pointer Knowledge Graph construction): The method for building the KG via static analysis to extract points-to flows, ownership, and lifetimes is not specified in detail, including handling of undecidable C constructs such as void*, unions, and function pointers. Since the central claim depends on the KG supplying accurate global semantics to the LLM, this omission is load-bearing.

    Authors: Section 3 outlines the static analysis pipeline used to construct the Pointer Knowledge Graph, including extraction of points-to flows and Rust-oriented annotations. We acknowledge that additional detail on the treatment of undecidable constructs would strengthen the presentation. We will expand Section 3 with explicit discussion of our handling of void*, unions, and function pointers, including the conservative approximations employed where precise analysis is undecidable. revision: yes

  3. Referee: [Section 5] Section 5 (Experimental results): The evaluation lacks description of how functional correctness was measured (e.g., test suites or fuzzing protocols) and how unsafe code usages were quantified across projects, making it impossible to verify the reported gains or rule out benchmark-specific artifacts.

    Authors: Section 5 describes the functional correctness evaluation, which relies on project-provided test suites supplemented by fuzzing protocols, and quantifies unsafe usages via counts of unsafe blocks and raw pointer operations in the generated Rust code. We will revise Section 5 to make these measurement procedures more explicit, including additional examples of the test suites and quantification process, to facilitate independent verification. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes constructing a Pointer Knowledge Graph to augment dependency graphs with global pointer semantics (points-to flows, ownership, mutability, nullability, lifetimes) and then uses this KG to prompt LLMs for project-level C-to-Rust translation. No equations, fitted parameters, self-citations, or uniqueness theorems are described that would reduce any claimed result or prediction back to the inputs by construction. The experimental claims of 99.9% unsafe reduction and 29.3% correctness gain are presented as evaluation outcomes on benchmarks rather than tautological redefinitions of the method itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the LLM being able to interpret and act on the supplied global pointer semantics; the knowledge graph itself is an invented construct whose accuracy is not independently verified in the abstract.

axioms (1)
  • domain assumption Large language models can reliably translate C to safe Rust when supplied with explicit global pointer usage and Rust-oriented annotations.
    The method assumes LLMs will correctly apply the graph information to avoid unsafe patterns while preserving functionality.
invented entities (1)
  • Pointer Knowledge Graph no independent evidence
    purpose: Augment code dependency graphs with points-to flows, struct abstractions, and Rust annotations for ownership, mutability, nullability, and lifetime.
    This graph is introduced to solve the lack of global pointer view in bottom-up translation.

pith-pipeline@v0.9.0 · 5812 in / 1344 out tokens · 43575 ms · 2026-05-18T08:22:53.598681+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories

    cs.SE 2026-04 unverdicted novelty 7.0

    ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 11...

  2. CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

    cs.SE 2026-04 unverdicted novelty 6.0

    CodePivot uses Python as a pivot language plus an Aggressive-Partial-Functional RL reward to train a 7B model that outperforms much larger LLMs on multilingual code transpilation without parallel corpora.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · cited by 2 Pith papers · 7 internal anchors

  1. [1]

    How ISO C became unusable for operating systems development,

    V . Yodaiken, “How ISO C became unusable for operating systems development,”CoRR, vol. abs/2201.07845, 2022. [Online]. Available: https://arxiv.org/abs/2201.07845

  2. [2]

    A Survey of Embedded Software Profiling Methodologies

    R. Patel and A. Rajawat, “A survey of embedded software profiling methodologies,”CoRR, vol. abs/1312.2949, 2013. [Online]. Available: http://arxiv.org/abs/1312.2949

  3. [3]

    Ac/c++ code vulnerability dataset with code changes and cve summaries,

    J. Fan, Y . Li, S. Wang, and T. N. Nguyen, “Ac/c++ code vulnerability dataset with code changes and cve summaries,” inProceedings of the 17th international conference on mining software repositories, 2020, pp. 508–512

  4. [4]

    Rust won’t save us: An analysis of 2023’s known exploited vulnerabilities,

    Z. Hanley, “Rust won’t save us: An analysis of 2023’s known exploited vulnerabilities,” 2023

  5. [5]

    AEGIS: towards formalized and practical memory-safe execution of C programs via MSW ASM,

    S. Esmaeilsabzali, A. Khalatyan, Z. Mo, S. Venkatanarayanan, and S. Xu, “AEGIS: towards formalized and practical memory-safe execution of C programs via MSW ASM,”CoRR, vol. abs/2503.03698,

  6. [6]
  7. [7]

    A closer look at the security risks in the rust ecosystem,

    X. Zheng, Z. Wan, Y . Zhang, R. Chang, and D. Lo, “A closer look at the security risks in the rust ecosystem,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 2, pp. 34:1–34:30, 2024. [Online]. Available: https://doi.org/10.1145/3624738

  8. [8]

    Memory-safety challenge considered solved? an in-depth study with all rust cves,

    H. Xu, Z. Chen, M. Sun, Y . Zhou, and M. R. Lyu, “Memory-safety challenge considered solved? an in-depth study with all rust cves,” ACM Trans. Softw. Eng. Methodol., vol. 31, no. 1, pp. 3:1–3:25, 2022. [Online]. Available: https://doi.org/10.1145/3466642

  9. [9]

    On the dual nature of necessity in use of rust unsafe code,

    Y . Zhang, A. Kundu, G. Portokalidis, and J. Xu, “On the dual nature of necessity in use of rust unsafe code,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, S. Chandra, K. Blincoe, and P. Tonella, Eds. ACM, 20...

  10. [10]

    The rust language,

    N. D. Matsakis and F. S. Klock, “The rust language,” inProceedings of the 2014 ACM SIGAda annual conference on High integrity language technology, 2014, pp. 103–104

  11. [11]

    for Linux

    R. for Linux. [Online]. Available: https://github.com/Rust-for-Linux/ linux

  12. [12]

    [Online]

    c2rust, 2025. [Online]. Available: https://github.com/immunant/c2rust

  13. [13]

    Ownership guided C to rust translation,

    H. Zhang, C. David, Y . Yu, and M. Wang, “Ownership guided C to rust translation,” inComputer Aided Verification - 35th International Conference, CAV 2023, Paris, France, July 17-22, 2023, Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 13966. Springer, 2023, pp. 459–482. [Online]. Available: https://doi.org/10.1007/978-3-031-37709-9 22

  14. [14]

    [Online]

    citrus, 2025. [Online]. Available: https://gitlab.com/citrus-rs/citrus# citrus-convert-c-to-rust

  15. [15]

    Translating C to safer rust,

    M. Emre, R. Schroeder, K. Dewey, and B. Hardekopf, “Translating C to safer rust,”Proc. ACM Program. Lang., vol. 5, no. OOPSLA, pp. 1–29, 2021. [Online]. Available: https://doi.org/10.1145/3485498

  16. [16]

    Aliasing limits on translating C to safe rust,

    M. Emre, P. Boyland, A. Parekh, R. Schroeder, K. Dewey, and B. Hardekopf, “Aliasing limits on translating C to safe rust,”Proc. ACM Program. Lang., vol. 7, no. OOPSLA1, pp. 551–579, 2023. [Online]. Available: https://doi.org/10.1145/3586046

  17. [17]

    Safemd: Ownership-based safe memory deallocation for c programs,

    X. Yin, Z. Huang, S. Kan, and G. Shen, “Safemd: Ownership-based safe memory deallocation for c programs,” 2024

  18. [18]

    Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

    Y . Gao, C. Wang, P. Huang, X. Liu, M. Zheng, and X. Zhang, “PR2: peephole raw pointer rewriting with llms for translating C to safer rust,”CoRR, vol. abs/2505.04852, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2505.04852

  19. [19]

    C2saferrust: Transforming C projects into safer rust with neurosymbolic techniques,

    V . Nitin, R. Krishna, L. L. do Valle, and B. Ray, “C2saferrust: Transforming C projects into safer rust with neurosymbolic techniques,” CoRR, vol. abs/2501.14257, 2025. [Online]. Available: https://doi.org/ 10.48550/arXiv.2501.14257

  20. [20]

    Don’t write, but return: Replacing output parameters with algebraic data types in c-to-rust translation,

    J. Hong and S. Ryu, “Don’t write, but return: Replacing output parameters with algebraic data types in c-to-rust translation,”Proc. ACM Program. Lang., vol. 8, no. PLDI, pp. 716–740, 2024. [Online]. Available: https://doi.org/10.1145/3656406

  21. [21]

    Towards translating real-world code with llms: A study of translating to rust,

    H. F. Eniser, H. Zhang, C. David, M. Wang, M. Christakis, B. Paulsen, J. Dodds, and D. Kroening, “Towards translating real-world code with llms: A study of translating to rust,”CoRR, vol. abs/2405.11514, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2405.11514

  22. [22]

    VERT: verified equivalent rust transpilation with few- shot learning,

    A. Z. H. Yang, Y . Takashima, B. Paulsen, J. Dodds, and D. Kroening, “VERT: verified equivalent rust transpilation with few- shot learning,”CoRR, vol. abs/2404.18852, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2404.18852

  23. [23]

    Syzygy: Dual code-test C to (safe) rust translation using llms and dynamic analysis,

    M. Shetty, N. Jain, A. Godbole, S. A. Seshia, and K. Sen, “Syzygy: Dual code-test C to (safe) rust translation using llms and dynamic analysis,”CoRR, vol. abs/2412.14234, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2412.14234

  24. [24]

    Rustmap: Towards project-scale c-to-rust migration via program analysis and LLM,

    X. Cai, J. Liu, X. Huang, Y . Yu, H. Wu, C. Li, B. Wang, I. N. B. Yusuf, and L. Jiang, “Rustmap: Towards project-scale c-to-rust migration via program analysis and LLM,”CoRR, vol. abs/2503.17741, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2503.17741

  25. [25]

    Context-aware code segmentation for c-to-rust translation using large language models,

    M. Shiraishi and T. Shinagawa, “Context-aware code segmentation for c-to-rust translation using large language models,”CoRR, vol. abs/2409.10506, 2024. [Online]. Available: https://doi.org/10.48550/ arXiv.2409.10506

  26. [26]

    Type-migrating c-to-rust translation using a large language model,

    J. Hong and S. Ryu, “Type-migrating c-to-rust translation using a large language model,”Empir. Softw. Eng., vol. 30, no. 1, p. 3, 2025. [Online]. Available: https://doi.org/10.1007/s10664-024-10573-2

  27. [27]

    [Online]

    Quadtree. [Online]. Available: https://github.com/thejefflarson/quadtree

  28. [28]

    [Online]

    openAI. [Online]. Available: https://openai.com/

  29. [29]

    Lost in translation: A study of bugs introduced by large language models while translating code,

    R. Pan, A. R. Ibrahimzada, R. Krishna, D. Sankar, L. P. Wassi, M. Merler, B. Sobolev, R. Pavuluri, S. Sinha, and R. Jabbarvand, “Lost in translation: A study of bugs introduced by large language models while translating code,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 202...

  30. [30]

    Exploring and unleashing the power of large language models in automated code translation,

    Z. Yang, F. Liu, Z. Yu, J. W. Keung, J. Li, S. Liu, Y . Hong, X. Ma, and Z. J. andns Ge Li, “Exploring and unleashing the power of large language models in automated code translation,”Proc. ACM Softw. Eng., vol. 1, no. FSE, pp. 1585–1608, 2024. [Online]. Available: https://doi.org/10.1145/3660778

  31. [31]

    Alphatrans: A neuro-symbolic compositional approach for repository-level code translation and validation,

    A. R. Ibrahimzada, K. Ke, M. Pawagi, M. S. Abid, R. Pan, S. Sinha, and R. Jabbarvand, “Alphatrans: A neuro-symbolic compositional approach for repository-level code translation and validation,”Proc. ACM Softw. Eng., vol. 2, no. FSE, pp. 2454–2476, 2025. [Online]. Available: https://doi.org/10.1145/3729379

  32. [32]

    Repotransagent: Multi-agent llm framework for repository-aware code translation,

    Z. Guan, X. Yin, Z. Peng, and C. Ni, “Repotransagent: Multi-agent llm framework for repository-aware code translation,” 2025. [Online]. Available: https://arxiv.org/abs/2508.17720

  33. [33]

    Scalable, validated code translation of entire projects using large language models,

    H. Zhang, C. David, M. Wang, B. Paulsen, and D. Kroening, “Scalable, validated code translation of entire projects using large language models,”CoRR, vol. abs/2412.08035, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2412.08035

  34. [34]

    Unsupervised translation of programming languages,

    B. Rozi `ere, M. Lachaux, L. Chanussot, and G. Lample, “Unsupervised translation of programming languages,” inAdvances in Neural Informa- tion Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

  35. [35]

    Enhancing code translation in language models with few- shot learning via retrieval-augmented generation,

    M. Bhattarai, J. E. Santos, S. Jones, A. Biswas, B. Alexandrov, and D. O’Malley, “Enhancing code translation in language models with few- shot learning via retrieval-augmented generation,” in2024 IEEE High Performance Extreme Computing Conference (HPEC), 2024, pp. 1–8

  36. [36]

    Compiling C to safe rust, formalized,

    A. Fromherz and J. Protzenko, “Compiling C to safe rust, formalized,”CoRR, vol. abs/2412.15042, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2412.15042

  37. [37]

    Migrating C to rust for memory safety,

    P. Larsen, “Migrating C to rust for memory safety,”IEEE Secur. Priv., vol. 22, no. 4, pp. 22–29, 2024. [Online]. Available: https://doi.org/10.1109/MSEC.2024.3385357

  38. [38]

    [Online]

    Corrode, 2017. [Online]. Available: https://github.com/jameysharp/ corrode

  39. [39]

    In rust we trust - A transpiler from unsafe C to safer rust,

    M. Ling, Y . Yu, H. Wu, Y . Wang, J. R. Cordy, and A. E. Hassan, “In rust we trust - A transpiler from unsafe C to safer rust,” in 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2022, Pittsburgh, PA, USA, May 22-24, 2022. ACM/IEEE, 2022, pp. 354–355. [Online]. Available: https://doi.org/10.1145/351045...

  40. [40]

    To tag, or not to tag: Translating c’s unions to rust’s tagged unions,

    J. Hong and S. Ryu, “To tag, or not to tag: Translating c’s unions to rust’s tagged unions,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, V . Filkov, B. Ray, and M. Zhou, Eds. ACM, 2024, pp. 40–52. [Online]. Available: https://doi.org/10.1145/36...

  41. [41]

    Forcrat: Automatic I/O API translation from C to rust via origin and capability analysis,

    ——, “Forcrat: Automatic I/O API translation from C to rust via origin and capability analysis,”CoRR, vol. abs/2506.01427, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2506.01427

  42. [42]

    Crust-bench: A comprehensive benchmark for c-to-safe-rust transpilation,

    A. Khatry, R. Zhang, J. Pan, Z. Wang, Q. Chen, G. Durrett, and I. Dillig, “Crust-bench: A comprehensive benchmark for c-to-safe-rust transpilation,”CoRR, vol. abs/2504.15254, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2504.15254

  43. [43]

    Translating C to rust: Lessons from a user study,

    R. Li, B. Wang, T. Li, P. Saxena, and A. Kundu, “Translating C to rust: Lessons from a user study,” in32nd Annual Network and Distributed System Security Symposium, NDSS 2025, San Diego, California, USA, February 24-28, 2025. The Internet Society,

  44. [44]

    Available: https://www.ndss-symposium.org/ndss-paper/ translating-c-to-rust-lessons-from-a-user-study/

    [Online]. Available: https://www.ndss-symposium.org/ndss-paper/ translating-c-to-rust-lessons-from-a-user-study/

  45. [45]

    C2rusttv: An llm-based frame- work for c to rust translation and validation,

    H. Zhou, Y . Luo, M. Zhang, and D. Xu, “C2rusttv: An llm-based frame- work for c to rust translation and validation,” in2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC), 2025, pp. 1254–1259

  46. [46]

    Llm-driven multi-step translation from C to rust using static analysis,

    T. Zhou, H. Lin, S. Jha, M. Christodorescu, K. Levchenko, and V . Chandrasekaran, “Llm-driven multi-step translation from C to rust using static analysis,”CoRR, vol. abs/2503.12511, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2503.12511

  47. [47]

    Search-Based Multi-Trajectory Refinement for Safe C-to-Rust Translation with Large Language Models

    H. Sim, H. Cho, Y . Go, Z. Fu, A. Shokri, and B. Ravindran, “Large language model-powered agent for C to rust code translation,” CoRR, vol. abs/2505.15858, 2025. [Online]. Available: https://doi.org/ 10.48550/arXiv.2505.15858

  48. [48]

    Toward llm-based large-scale c-to-rust code translation

    M. Shiraishi and T. Shinagawa, “Toward llm-based large-scale c-to-rust code translation.”

  49. [49]

    Evaluating instruction-tuned large language models on code comprehension and generation,

    Z. Yuan, J. Liu, Q. Zi, M. Liu, X. Peng, and Y . Lou, “Evaluating instruction-tuned large language models on code comprehension and generation,”CoRR, vol. abs/2308.01240, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.01240

  50. [50]

    Self-collaboration code generation via chatgpt,

    Y . Dong, X. Jiang, Z. Jin, and G. Li, “Self-collaboration code generation via chatgpt,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 7, pp. 189:1–189:38, 2024. [Online]. Available: https://doi.org/10. 1145/3672459

  51. [51]

    Evaluating and improving chatgpt for unit test generation,

    Z. Yuan, M. Liu, S. Ding, K. Wang, Y . Chen, X. Peng, and Y . Lou, “Evaluating and improving chatgpt for unit test generation,”Proc. ACM Softw. Eng., vol. 1, no. FSE, pp. 1703–1726, 2024. [Online]. Available: https://doi.org/10.1145/3660783

  52. [52]

    Automated repair of programs from large language models,

    Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan, “Automated repair of programs from large language models,” in 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2023, pp. 1469–1481. [Online]. Available: https://doi.org/10.1109/ICSE48619. 2023.00128

  53. [53]

    Automated program repair in the era of large pre-trained language models,

    C. S. Xia, Y . Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models,” in45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2023, pp. 1482–1494. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00129

  54. [54]

    Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?

    I. Bouzenia, P. T. Devanbu, and M. Pradel, “Repairagent: An autonomous, llm-based agent for program repair,” in47th IEEE/ACM International Conference on Software Engineering, ICSE 2025, Ottawa, ON, Canada, April 26 - May 6, 2025. IEEE, 2025, pp. 2188–2200. [Online]. Available: https://doi.org/10.1109/ICSE55347.2025.00157

  55. [55]

    TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

    Z. Yuan, W. Chen, H. Wang, K. Yu, X. Peng, and Y . Lou, “TRANSAGENT: an llm-based multi-agent system for code translation,” CoRR, vol. abs/2409.19894, 2024. [Online]. Available: https://doi.org/ 10.48550/arXiv.2409.19894

  56. [56]

    Enhancing llm-based code translation in repository context via triple knowledge-augmented,

    G. Ou, M. Liu, Y . Chen, X. Du, S. Wang, Z. Zhang, X. Peng, and Z. Zheng, “Enhancing llm-based code translation in repository context via triple knowledge-augmented,”CoRR, vol. abs/2503.18305, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2503.18305

  57. [57]

    Repository-level code translation benchmark targeting rust,

    G. Ou, M. Liu, Y . Chen, X. Peng, and Z. Zheng, “Repository-level code translation benchmark targeting rust,”CoRR, vol. abs/2411.13990,

  58. [58]

    Repository-level code translation benchmark targeting rust,

    [Online]. Available: https://doi.org/10.48550/arXiv.2411.13990

  59. [59]

    SafeTrans: LLM-assisted Transpilation from C to Rust

    M. Farrukh, S. Shah, B. Coskun, and M. Polychronakis, “Safetrans: Llm-assisted tran2spilation from C to rust,”CoRR, vol. abs/2505.10708,

  60. [60]

    SafeTrans: LLM-assisted Transpilation from C to Rust

    [Online]. Available: https://doi.org/10.48550/arXiv.2505.10708

  61. [61]

    Spectra: Enhancing the code translation ability of language models by generating multi-modal specifications,

    V . Nitin and B. Ray, “Spectra: Enhancing the code translation ability of language models by generating multi-modal specifications,”CoRR, vol. abs/2405.18574, 2024. [Online]. Available: https://doi.org/10.48550/ arXiv.2405.18574

  62. [62]

    The protection of information in computer systems,

    J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,”Proc. IEEE, vol. 63, no. 9, pp. 1278–1308, 1975. [Online]. Available: https://doi.org/10.1109/PROC.1975.9939

  63. [63]

    [Online]

    libtree. [Online]. Available: https://github.com/haampie/libtree/blob/ master/libtree.c#L191

  64. [64]

    Depth-first search and linear graph algorithms,

    R. E. Tarjan, “Depth-first search and linear graph algorithms,”SIAM J. Comput., vol. 1, no. 2, pp. 146–160, 1972. [Online]. Available: https://doi.org/10.1137/0201010

  65. [65]

    [Online]

    quadtree point free. [Online]. Available: https://github.com/ thejefflarson/quadtree/blob/master/src/point.c#L14

  66. [66]

    [Online]

    bzip2. [Online]. Available: https://github.com/commontk/bzip2/blob/ master

  67. [67]

    [Online]

    libcsv. [Online]. Available: https://github.com/rgamble/libcsv/tree/master

  68. [68]

    [Online]

    robotfindskitten. [Online]. Available: https://github.com/robotfindskitten/ robotfindskitten

  69. [69]

    [Online]

    Clippy. [Online]. Available: https://github.com/rust-lang/rust-clippy

  70. [70]

    [Online]

    cargo geiger. [Online]. Available: https://docs.rs/crate/cargo-geiger/latest

  71. [71]

    [Online]

    Doxygen. [Online]. Available: https://doxygen.nl/

  72. [72]

    [Online]

    SVF. [Online]. Available: https://github.com/SVF-tools/SVF

  73. [73]

    Analysis of chatgpt-generated codes across multiple programming languages,

    S. Almanasra and K. Suwais, “Analysis of chatgpt-generated codes across multiple programming languages,”IEEE Access, vol. 13, pp. 23 580–23 596, 2025. [Online]. Available: https://doi.org/10.1109/ ACCESS.2025.3538050

  74. [74]

    On iterative evaluation and enhancement of code quality using gpt-4o,

    R. Liu, A. Frade, A. Vaidya, M. Labonne, M. Kaiser, B. Chakrabarti, J. Budd, and S. J. Moran, “On iterative evaluation and enhancement of code quality using gpt-4o,”CoRR, vol. abs/2502.07399, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2502.07399

  75. [75]

    Evaluation of gpt 4o for mobile applications code conversion,

    A. Mashaal, O. Helmy, O. Ashor, A. T. Mahmoud, and WalaaMedhat, “Evaluation of gpt 4o for mobile applications code conversion,” in2024 12th International Japan-Africa Conference on Electronics, Communi- cations, and Computations (JAC-ECC), 2024, pp. 219–224

  76. [76]

    Repotransbench: A real-world benchmark for repository-level code translation,

    Y . Wang, Y . Wang, S. Wang, D. Guo, J. Chen, J. C. Grundy, X. Liu, Y . Ma, M. Mao, H. Zhang, and Z. Zheng, “Repotransbench: A real-world benchmark for repository-level code translation,”CoRR, vol. abs/2412.17744, 2024. [Online]. Available: https://doi.org/10.48550/ arXiv.2412.17744

  77. [77]

    [Online]

    Claude. [Online]. Available: https://www.anthropic.com/index/ introducing-claude

  78. [78]

    [Online]

    Gemini. [Online]. Available: https://blog.google/technology/ai/ google-gemini-ai/

  79. [79]

    Text rendering strategies for pixel language models

    W. Yan, Y . Tian, Y . Li, Q. Chen, and W. Wang, “Codetransocean: A comprehensive multilingual benchmark for code translation,” in Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Association for Computational Linguistics, 2023, pp. 5067–5089. [Online]. Available: ...

  80. [80]

    CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

    S. Ren, D. Guo, S. Lu, L. Zhou, S. Liu, D. Tang, N. Sundaresan, M. Zhou, A. Blanco, and S. Ma, “Codebleu: a method for automatic evaluation of code synthesis,”CoRR, vol. abs/2009.10297, 2020. [Online]. Available: https://arxiv.org/abs/2009.10297