Recognition: no theorem link
Finding a Crab in the C: Assured Translation via Comparative Symbolic Execution
Pith reviewed 2026-05-14 19:51 UTC · model grok-4.3
The pith
cozy performs comparative symbolic execution on original and translated binaries to flag differences for developer review while proving equivalence elsewhere.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Outside of the flagged differences, the binaries are formally verified to be equivalent. Consequently, the review process guarantees equivalence modulo changes approved by the developer.
Load-bearing premise
That symbolic execution on the compiled binaries can exhaustively detect all semantic differences without missing behaviors or generating unmanageable numbers of spurious differences, and that the binaries faithfully represent the source-level semantics.
Figures
read the original abstract
Modern high-assurance software systems development favors memory safe languages such as SPARK (ADA) or Rust. However, developers often encounter non-memory safe code (e.g., C) in legacy systems and libraries which would be prohibitively expensive or risky to re-write. In response, developers have begun turning to machine learning/AI systems and other automated code translators. Automated translation comes with its own risks, however. The original and ported code are not precisely the same, semantically - otherwise there would be no point in performing the translation. To reduce these risks, we have developed cozy, a comparative binary analysis tool that simultaneously analyzes a binary compiled from "unsafe" source code and a binary compiled from a translation of the source code to a memory safe language. cozy walks the developer through differences in the behavior of the two binaries, presenting each difference and asking the user to assess whether the difference is intentional (good) or erroneous. Outside of the flagged differences, the binaries are formally verified to be equivalent. Consequently, the review process guarantees equivalence modulo changes approved by the developer. cozy has applications to automated translation, bug correction, code reviews, operation authorization, and automatic translation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces cozy, a comparative symbolic execution tool that simultaneously analyzes binaries compiled from original C source and from its automated translation to a memory-safe language. It presents behavioral differences to the developer for approval or rejection as intentional changes; outside the flagged differences, the binaries are claimed to be formally verified equivalent, thereby guaranteeing equivalence modulo developer-approved modifications. The approach targets applications including automated translation assurance, bug correction, and code review.
Significance. If the central claim holds, the work would offer a practical method for reducing semantic risks in porting legacy unsafe code to safe languages, which is a pressing need in high-assurance systems. It extends standard symbolic execution to a comparative setting and could support reproducible review processes. However, the absence of any completeness argument, coverage metrics, or handling for path explosion substantially weakens the assessed significance, as the equivalence guarantee rests on unproven exhaustiveness assumptions.
major comments (3)
- [Abstract and §3] Abstract and §3 (comparative symbolic execution description): the claim that binaries are 'formally verified to be equivalent' outside flagged differences lacks any algorithm outline, termination condition, or completeness argument for handling loops, recursion, pointer aliasing, and input-dependent control flow; standard symbolic execution on binaries is known to require bounds that can miss behaviors, directly undermining the verification guarantee.
- [§5] §5 (evaluation): no coverage metrics, path counts, or data on spurious differences are reported, leaving the weakest assumption—that symbolic execution exhaustively detects all semantic divergences without unmanageable false positives—unsupported for realistic programs.
- [§2] §2 (background and assumptions): the load-bearing premise that compiled binaries faithfully preserve source-level semantics is not addressed with respect to compiler optimizations or undefined behavior, which can introduce discrepancies invisible at the source level and outside the scope of the comparative analysis.
minor comments (2)
- [Abstract] The abstract and introduction use 'cozy' without an initial definition or expansion; add a parenthetical on first use.
- [§3] Notation for symbolic states and difference flags is introduced without a summary table; a small notation table would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major point below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (comparative symbolic execution description): the claim that binaries are 'formally verified to be equivalent' outside flagged differences lacks any algorithm outline, termination condition, or completeness argument for handling loops, recursion, pointer aliasing, and input-dependent control flow; standard symbolic execution on binaries is known to require bounds that can miss behaviors, directly undermining the verification guarantee.
Authors: We agree that §3 would benefit from greater detail. In the revision we will expand the algorithm description to explicitly outline the steps of comparative symbolic execution, state the configurable bounds on loop iterations and recursion depth used to guarantee termination, describe the alias analysis employed for pointers, and clarify that equivalence is formally established only over the explored paths. The 'formal verification' claim will be qualified to apply within these practical bounds rather than claiming unbounded exhaustiveness. revision: yes
-
Referee: [§5] §5 (evaluation): no coverage metrics, path counts, or data on spurious differences are reported, leaving the weakest assumption—that symbolic execution exhaustively detects all semantic divergences without unmanageable false positives—unsupported for realistic programs.
Authors: We accept that the evaluation section is currently light on quantitative data. We will revise §5 to include path counts, statement and branch coverage figures for each benchmark, and statistics on the number of differences surfaced together with how many were classified as spurious by the human reviewers. These additions will directly support the practicality claims. revision: yes
-
Referee: [§2] §2 (background and assumptions): the load-bearing premise that compiled binaries faithfully preserve source-level semantics is not addressed with respect to compiler optimizations or undefined behavior, which can introduce discrepancies invisible at the source level and outside the scope of the comparative analysis.
Authors: The tool operates directly on the compiled binaries; therefore any optimization or undefined-behavior effects present in the binaries are part of the semantics being compared and will be detected as differences if they diverge between the two binaries. We will add a short clarifying paragraph in §2 making this scope explicit and noting that the approach can surface compilation-induced discrepancies. revision: partial
- A general, unbounded completeness argument for symbolic execution; the implementation relies on user-configurable bounds for termination and scalability, and no such argument can be supplied without additional program restrictions.
Circularity Check
No circularity: equivalence claim rests on standard symbolic execution without self-referential definitions or fitted predictions
full rationale
The paper introduces the cozy tool for comparative symbolic execution on binaries from original C and translated safe-language code. It identifies differences for developer review and claims formal equivalence outside flagged cases. This rests on applying established symbolic execution to detect behavioral divergences rather than deriving results from self-defined parameters, renaming known patterns, or load-bearing self-citations. No equations reduce the verification result to its own inputs by construction, and the method does not invoke uniqueness theorems or ansatzes from prior author work. The derivation chain is self-contained against external symbolic execution techniques.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Compiled binaries accurately reflect the semantics of their source code.
- domain assumption Symbolic execution can be applied to the binaries to explore and compare their behaviors.
invented entities (1)
-
cozy
no independent evidence
Reference graph
Works this paper leans on
-
[1]
“Software memory safety,” National Security Agency, Tech. Rep., Apr. 2023. [Online]. Available: https : / / media . defense . gov / 2022 / Nov / 10 / 2003112742/ - 1/ - 1/0/CSI SOFTW ARE MEMORY SAFETY .PDF
work page 2023
-
[2]
Memory safe languages: Reducing vulnerabilities in modern software development,
“Memory safe languages: Reducing vulnerabilities in modern software development,” National Security Agency, Tech. Rep., Jun. 2025. [Online]. Available: https://media.defense.gov/2025/Jun/23/2003742198/- 1/ - 1 / 0 / CSI MEMORY SAFE LANGUAGES REDUCING VULNERABILITIES IN MODERN SOFTW ARE DEVELOPMENT.PDF
work page 2025
-
[3]
“The case for memory safe roadmaps: Why both c-suite executives and technical experts need to take memory safe coding seriously,” Cybersecurity and Infrastructure Security Agency, Tech. Rep., Dec. 2023. [Online]. Available: https://www.cisa.gov/sites/default/files/2023- 12/The-Case-for-Memory-Safe-Roadmaps-508c.pdf
work page 2023
-
[4]
Back to building blocks: A path towards secure and measurable software,
“Back to building blocks: A path towards secure and measurable software,” Office of the National Cyber Director, Tech. Rep., Feb. 2024. [Online]. Available: https://web.archive.org/web/20240227002908/https: //www.whitehouse.gov/wp- content/uploads/2024/02/ Final-ONCD-Technical-Report.pdf
- [5]
-
[6]
SafeTrans: LLM-assisted transpilation from C to rust,
M. Farrukh, S. Shah, B. Coskun, and M. Polychronakis, “SafeTrans: LLM-assisted transpilation from C to rust,” arXiv [cs.CR], May 2025
work page 2025
-
[7]
M. Weiss, J. Hecking-Harbusch, J. Quante, and M. Woehrle, “Feedback loops and code perturbations in LLM-based software engineering: A case study on a C-to-rust translation system,”arXiv [cs.SE], Dec. 2025
work page 2025
-
[8]
Integrating rules and semantics for LLM- based C-to-rust translation,
F. Luo et al., “Integrating rules and semantics for LLM- based C-to-rust translation,” in2025 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, Sep. 2025, pp. 685–696
work page 2025
-
[9]
T. Zhou et al., “SACTOR: LLM-driven correct and idiomatic C to rust translation with static analysis and FFI-based verification,”arXiv [cs.SE], Dec. 2025
work page 2025
-
[10]
Cozy: Comparative symbolic execution for binary programs,
C. Helbling, G. Leach-Krouse, S. Lasser, and G. Sulli- van, “Cozy: Comparative symbolic execution for binary programs,” inProceedings of the 2025 Workshop on Binary Analysis Research, Reston, V A: Internet Soci- ety, 2025.DOI: 10 . 14722 / bar. 2025 . 23004 [Online]. Available: https://doi.org/10.14722/bar.2025.23004
-
[11]
SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis,
Y . Shoshitaishvili et al., “SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis,” inIEEE Symposium on Security and Privacy, 2016.DOI: 10 . 1109/SP.2016.17 [Online]. Available: https://doi.org/10. 1109/SP.2016.17
work page 2016
-
[12]
Differential Symbolic Execution,
S. Person, M. B. Dwyer, S. Elbaum, and C. S. P˘as˘areanu, “Differential Symbolic Execution,” inFoun- dations of Software Engineering (FSE), 2008,ISBN: 9781595939951.DOI: 10 . 1145 / 1453101 . 1453131 [Online]. Available: https://doi.org/10.1145/1453101. 1453131
-
[13]
Relational Symbolic Execution,
G. P. Farina, S. Chong, and M. Gaboardi, “Relational Symbolic Execution,” inPrinciples and Practice of Programming Languages (PPDP), 2019.DOI: 10.1145/ 3354166.3354175 [Online]. Available: https://doi.org/ 10.1145/3354166.3354175
-
[14]
Division by in- variant integers using multiplication,
T. Granlund and P. L. Montgomery, “Division by in- variant integers using multiplication,” inProceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, ser. PLDI ’94, Orlando, Florida, USA: Association for Computing Machinery, 1994, 61–72,ISBN: 089791662X.DOI: 10. 1145/178243.178249 [Online]. Available: https://doi. ...
-
[15]
Verifying dynamic trait objects in rust,
A. VanHattum, D. Schwartz-Narbonne, N. Chong, and A. Sampson, “Verifying dynamic trait objects in rust,” inProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, ser. ICSE-SEIP ’22, Pittsburgh, Pennsylvania: As- sociation for Computing Machinery, 2022, 321–330, ISBN: 9781450392266.DOI: 10.1145/3510457...
-
[16]
Vert: Verified equivalent rust transpilation with large language models as few-shot learners,
A. Z. Yang, Y . Takashima, B. Paulsen, J. Dodds, and D. Kroening, “Vert: Verified equivalent rust transpilation with large language models as few-shot learners,”arXiv preprint arXiv:2404.18852, 2024
-
[17]
RustAssure: Differential sym- bolic testing for LLM-transpiled C-to-rust code,
Y . Bai and T. Palit, “RustAssure: Differential sym- bolic testing for LLM-transpiled C-to-rust code,”arXiv [cs.SE], Oct. 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.