pith. machine review for the scientific record. sign in

arxiv: 2604.19628 · v1 · submitted 2026-04-21 · 💻 cs.CR · cs.PL

Recognition: unknown

Adding Compilation Metadata To Binaries To Make Disassembly Decidable

Binoy Ravindran, Daniel Engel, Freek Verbeek, Pranav Kumar

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:15 UTC · model grok-4.3

classification 💻 cs.CR cs.PL
keywords binary metadatadisassemblycompiler intentbinary liftingrecompilationbinary analysissoftware security
0
0 comments X

The pith

Metadata capturing compiler intent makes binary disassembly decidable and enables reliable lifting to recompilable code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes adding metadata to binary executables that records the compiler's decisions on which parts are executable instructions and how memory regions should be bounded. Standard binaries are opaque and prone to disassembly errors because the same byte sequence can be interpreted in multiple ways. The metadata provides the missing context so that analysis tools can produce a higher-level representation that recompiles correctly and preserves original behavior. If this holds, distributed software could be analyzed, instrumented, or verified more reliably without requiring source code access. The approach adds no runtime overhead and keeps the metadata compact relative to existing debug formats.

Core claim

The paper claims that a tool can extract and embed metadata from the compiler's internal decisions about code versus data and memory bounds directly into the binary. This augmented format sits between fully stripped binaries and open-source releases. It allows a lifting process to recover a correct, recompilable intermediate representation. Evaluation on real-world C and C++ programs demonstrates that the resulting lifted binaries can be instrumented and recompiled without changing observable behavior. The metadata is roughly 17 percent the size of DWARF information and introduces no measurable performance cost at runtime.

What carries the argument

The compilation metadata that explicitly marks intended executable instruction regions and memory bounds, generated by a tool that processes compiler output and inserts it into the binary.

If this is right

  • Disassembly of augmented binaries becomes unambiguous and produces a representation that can be recompiled identically.
  • Binary analysis and instrumentation tools gain reliability because they operate on compiler-intended semantics rather than guesses.
  • Software can be distributed as binaries while still supporting downstream lifting, modification, and verification steps.
  • The added metadata imposes no runtime performance penalty and remains far smaller than full debug information.
  • A comprehensive set of C and C++ programs can be processed end-to-end without behavioral changes after lifting and recompilation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This metadata could support automated verification of binary integrity in software supply chains without needing source code.
  • Compilers could be modified to emit the metadata by default, improving compatibility for all downstream analysis tools.
  • Similar intent-capturing metadata might resolve decidability problems in other binary tasks such as control-flow recovery.
  • Standardization of the metadata format across compilers would be needed for widespread adoption and tool interoperability.

Load-bearing premise

The metadata accurately reflects the compiler's decisions about code and data regions, and this metadata remains unchanged and trustworthy in the final distributed binary.

What would settle it

Compile a program, add the metadata, lift the binary to higher-level form, recompile it, and run both the original and the new version to check whether any observable behavior differs.

Figures

Figures reproduced from arXiv: 2604.19628 by Binoy Ravindran, Daniel Engel, Freek Verbeek, Pranav Kumar.

Figure 1
Figure 1. Figure 1: Disassembly comparison. this context and can be considered undesirable in the final binary, such as mappings from binary locations to their original source code locations. There is no format that is as close as possible to a stripped ELF, but that does provide crucial and security-relevant information on, e.g, the intended jump targets of indirections, or on which addresses in the binary are intended to be… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the lifting pipeline. • A tool to collect said metadata at build time, pack it into an efficient representation, and insert it as a section into the binary, producing ELLF files. • A lifter that consumes an ELLF file and lifts it to correct fully symbolized assembly code. Section 2 introduces the structure of the metadata we embed into binaries and explains how it enables precise disassembling … view at source ↗
Figure 3
Figure 3. Figure 3: Example of stack frame structure. Having fine-grained symbolized stack frames provides anal￾ysis tools with information necessary to do their work. By inserting the stack frame structure into the binary, one can, e.g., shuffle the objects on the stack. This is a form of binary diversification [9], [8], a practical technique to prevent bugs from being exploitable. One can also take a binary with shuffled st… view at source ↗
Figure 4
Figure 4. Figure 4: Example of lifting, with examples of the data provided [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example of suffix merging. On the top: Suffix merging [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramatically improve the safety and maintainability of software. In this paper, we propose a binary format that is a middle ground between a stripped black-box binary and open source. We provide a tool that generates metadata capturing the compiler's intent and inserts it into the binary. This metadata enables lifting to a correct and recompilable higher-level representation and makes analysis and instrumentation more reliable. Our evaluation shows that adding metadata does not affect runtime behavior or performance. Compared to DWARF, our metadata is roughly 17% of its size. We validate correctness by compiling a comprehensive set of real-world C and C++ binaries and demonstrating that they can be lifted, instrumented, and recompiled without altering their behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes augmenting binary executables with a new compilation metadata format that captures the compiler's intent regarding executable instructions and memory region bounds. This metadata is claimed to make disassembly decidable, enable lifting to correct and recompilable higher-level representations, and improve analysis and instrumentation reliability. The authors provide a tool to generate and insert this metadata, show that it has no runtime performance impact, is about 17% the size of DWARF, and validate it on a comprehensive set of real-world C and C++ binaries by demonstrating successful lifting, instrumentation, and recompilation without behavior changes.

Significance. If the claims hold, this work could have substantial impact on software security and binary analysis by providing a practical middle ground between opaque stripped binaries and full source availability. It directly tackles the undecidability of disassembly through embedded compiler metadata, which is lighter than debug information. The evaluation on real-world binaries and the size/performance claims are strengths that, if substantiated with details, would support adoption in security tools and compilers.

major comments (2)
  1. [Evaluation] The abstract and evaluation claim successful validation on real-world binaries with no runtime impact and successful lifting/recompilation, but the provided description lacks specific quantitative results (e.g., success rates, exact size measurements beyond the 17% figure), error analysis, or detailed methodology on how correctness was verified. This undermines the ability to fully assess the central claim.
  2. [Proposed metadata format and tool] The central claim that the metadata makes disassembly decidable and enables reliable downstream uses assumes the metadata remains trustworthy in distributed binaries. However, no mechanisms for integrity protection (such as cryptographic signatures, hashes, or loader verification) are described. An adversary could tamper with the metadata section to misrepresent executable regions without affecting runtime behavior, directly invalidating the decidability and lifting guarantees.
minor comments (1)
  1. [Abstract] The abstract states the metadata is 'roughly 17% of its size' compared to DWARF but does not provide the exact methodology or baseline for this comparison, which could be clarified for precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and indicate where revisions will be made to improve the manuscript.

read point-by-point responses
  1. Referee: [Evaluation] The abstract and evaluation claim successful validation on real-world binaries with no runtime impact and successful lifting/recompilation, but the provided description lacks specific quantitative results (e.g., success rates, exact size measurements beyond the 17% figure), error analysis, or detailed methodology on how correctness was verified. This undermines the ability to fully assess the central claim.

    Authors: We agree that additional quantitative details would strengthen the evaluation. The manuscript reports validation across a comprehensive set of real-world C and C++ binaries with successful lifting, instrumentation, and recompilation, plus the 17% size comparison to DWARF and no runtime impact. However, we will revise the evaluation section to include specific success rates, more precise size breakdowns, error analysis, and an expanded description of the verification methodology (e.g., how behavior equivalence was checked post-recompilation). revision: yes

  2. Referee: [Proposed metadata format and tool] The central claim that the metadata makes disassembly decidable and enables reliable downstream uses assumes the metadata remains trustworthy in distributed binaries. However, no mechanisms for integrity protection (such as cryptographic signatures, hashes, or loader verification) are described. An adversary could tamper with the metadata section to misrepresent executable regions without affecting runtime behavior, directly invalidating the decidability and lifting guarantees.

    Authors: The referee correctly identifies that the work assumes trusted metadata generated by the compiler. We did not describe cryptographic integrity mechanisms because the contribution centers on the metadata format and its utility for decidable disassembly and lifting when the metadata is present and accurate, analogous to other compiler-generated sections. Tampering is possible in principle, but this is outside the paper's scope of defining the format itself. We will add a new subsection on trust assumptions and note that external integrity protections (e.g., signatures) can be layered on top without altering the core approach. revision: partial

Circularity Check

0 steps flagged

No circularity; proposal and evaluation are independent of inputs

full rationale

The paper describes a tool for inserting compiler-derived metadata into binaries to improve disassembly decidability, lifting, and analysis. No equations, fitted parameters, predictions, or mathematical derivations appear in the abstract or description. The evaluation is framed as independent testing on real-world C/C++ binaries (compiling, lifting, instrumenting, recompiling) rather than any self-referential fit or renaming. No self-citations are invoked as load-bearing for uniqueness or ansatz; the central argument rests on the tool's design and empirical checks, which do not reduce to the inputs by construction. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The proposal rests on the domain assumption that compiler intent can be reliably extracted and represented without side effects; no free parameters or invented physical entities are described.

axioms (1)
  • domain assumption Compiler intent regarding executable instructions and memory bounds can be accurately captured by the tool and preserved in the binary.
    This assumption underpins the claim that lifting and analysis become reliable; it is invoked implicitly throughout the abstract.
invented entities (1)
  • Compilation metadata format no independent evidence
    purpose: To embed security-relevant compiler intent into binaries for decidable disassembly and reliable lifting.
    The paper introduces this as a new middle-ground representation between stripped binaries and open source.

pith-pipeline@v0.9.0 · 5487 in / 1343 out tokens · 42624 ms · 2026-05-10T02:15:37.280632+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    Acero, Z

    Alves-Foss, J., Venugopal, V .: The inconvenient truths of ground truth for binary analysis. CoRRabs/2210.15079 (2022). https://doi.org/10.48550/ARXIV .2210.15079, https: //doi.org/10.48550/arXiv.2210.15079

  2. [2]

    In: 25th USENIX security symposium (USENIX security 16)

    Andriesse, D., Chen, X., Van Der Veen, V ., Slowinska, A., Bos, H.: An In-Depth analysis of disassembly on Full-Scale x86/x64 binaries. In: 25th USENIX security symposium (USENIX security 16). pp. 583–600 (2016)

  3. [3]

    In: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume

    Assaiante, C., D’Elia, D.C., Di Luna, G.A., Querzoni, L.: Where did my variable go? poking holes in incomplete debug information. In: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume

  4. [4]

    p. 935–947. ASPLOS 2023, Association for Computing Machinery, New York, NY , USA (2023). https://doi.org/10.1145/3575693.3575720, https://doi.org/10.1145/3575693.3575720

  5. [5]

    In: International conference on compiler construction

    Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 executa- bles. In: International conference on compiler construction. pp. 5–23. Springer (2004)

  6. [6]

    Balakrishnan, G., Reps, T.: Recovery of variables and heap structure in x86 executables. Tech. rep., University of Wisconsin-Madison Depart- ment of Computer Sciences (2005)

  7. [7]

    In: 33rd USENIX Security Symposium (USENIX Security 24)

    Basque, Z.L., Bajaj, A.P., Gibbs, W., O’Kain, J., Miao, D., Bao, T., Doup´e, A., Shoshitaishvili, Y ., Wang, R.: Ahoy SAILR! there is no need to DREAM of C: A Compiler-Aware structuring algorithm for binary decompilation. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 361–378 (2024)

  8. [8]

    In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

    Ben Khadra, M.A., Stoffel, D., Kunz, W.: Efficient binary-level coverage analysis. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 1153–1164 (2020)

  9. [9]

    In: USENIX Security Symposium

    Bhatkar, S., DuVarney, D.C., Sekar, R.: Efficient techniques for com- prehensive protection from memory error exploits. In: USENIX Security Symposium. vol. 10 (2005)

  10. [10]

    In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

    Bhatkar, S., DuVarney, D.C., Sekar, R.: Address obfuscation: An effi- cient approach to combat a broad range of memory error exploits. In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

  11. [11]

    In: International Conference on Computer Aided Verification

    Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: A binary analysis platform. In: International Conference on Computer Aided Verification. pp. 463–469. Springer (2011)

  12. [12]

    In: 22nd USENIX Security Symposium (USENIX Security 13)

    Brumley, D., Lee, J., Schwartz, E.J., Woo, M.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: 22nd USENIX Security Symposium (USENIX Security 13). pp. 353–368 (2013)

  13. [13]

    Buck, B., Hollingsworth, J.K.: An api for runtime code patch- ing. Int. J. High Perform. Comput. Appl.14(4), 317–329 (Nov 2000). https://doi.org/10.1177/109434200001400404, https://doi.org/10. 1177/109434200001400404

  14. [14]

    In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

    Di Luna, G.A., Italiano, D., Massarelli, L., ¨Osterlund, S., Giuf- frida, C., Querzoni, L.: Who’s debugging the debuggers? exposing debug information bugs in optimized binaries. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. p. 1034–1045. AS- PLOS ’21, Association for Comp...

  15. [15]

    In: 2020 IEEE Symposium on Security and Privacy (SP)

    Dinesh, S., Burow, N., Xu, D., Payer, M.: Retrowrite: Statically in- strumenting COTS binaries for fuzzing and sanitization. In: 2020 IEEE Symposium on Security and Privacy (SP). pp. 1497–1511. IEEE (2020)

  16. [16]

    In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems

    Djoudi, A., Bardin, S.: Binsec: Binary code analysis with low-level regions. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 212–217. Springer (2015)

  17. [17]

    In: International Symposium on Formal Methods

    Djoudi, A., Bardin, S., Goubault, ´E.: Recovering high-level conditions from binary programs. In: International Symposium on Formal Methods. pp. 235–253. Springer (2016)

  18. [18]

    In: Proceedings of 1994 IEEE Inter- national Conference on Computer Languages (ICCL’94)

    Erosa, A.M., Hendren, L.J.: Taming control flow: A structured approach to eliminating goto statements. In: Proceedings of 1994 IEEE Inter- national Conference on Computer Languages (ICCL’94). pp. 229–240. IEEE (1994)

  19. [19]

    Harel, D.: On folk theorems. Commun. ACM23(7), 379–389 (Jul 1980). https://doi.org/10.1145/358886.358892, https://doi.org/10.1145/358886. 358892

  20. [20]

    In: Caragiannis, I., Alexander, M., Badia, R.M., Cannataro, M., Costan, A., Danelutto, M., Desprez, F., Krammer, B., Sahuquillo, J., Scott, S.L., Weidendorfer, J

    Ince, T., Hollingsworth, J.K.: Compiler help for binary manipulation tools. In: Caragiannis, I., Alexander, M., Badia, R.M., Cannataro, M., Costan, A., Danelutto, M., Desprez, F., Krammer, B., Sahuquillo, J., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par 2012: Parallel Processing Workshops. pp. 404–413. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)

  21. [21]

    In: Formal methods in computer aided design

    Kinder, J., Veith, H.: Precise static analysis of untrusted driver binaries. In: Formal methods in computer aided design. pp. 43–50. IEEE (2010)

  22. [22]

    Lee, J., Avgerinos, T., Brumley, D.: TIE: Principled reverse engineering of types in binary programs (2011)

  23. [23]

    In: Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation

    Li, K., Woo, M., Jia, L.: On the generation of disassembly ground truth and the evaluation of disassemblers. In: Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation. p. 9–14. FEAST’20, Association for Computing Machinery, New York, NY , USA (2020). https://doi.org/10.1145/3411502.3418429, https://doi. org/10.1145/34...

  24. [24]

    In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

    Li, Y ., Ding, S., Zhang, Q., Italiano, D.: Debug information validation for optimized code. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. p. 1052–1065. PLDI 2020, Association for Computing Machinery, New York, NY , USA (2020). https://doi.org/10.1145/3385412.3386020, https://doi.org/10.1145/3385412.3386020

  25. [25]

    In: Proceedings of the 2023 ACM SIGSAC Con- ference on Computer and Communications Security

    Lin, Z., Li, J., Li, B., Ma, H., Gao, D., Ma, J.: Typesqueezer: When static recovery of function signatures for binary executables meets dynamic analysis. In: Proceedings of the 2023 ACM SIGSAC Con- ference on Computer and Communications Security. p. 2725–2739. CCS ’23, Association for Computing Machinery, New York, NY , USA (2023). https://doi.org/10.114...

  26. [26]

    In: Proceedings of the 29th ACM SIG- SOFT International Symposium on Software Testing and Analysis

    Liu, Z., Wang, S.: How far we have come: testing decompilation correctness of C decompilers. In: Proceedings of the 29th ACM SIG- SOFT International Symposium on Software Testing and Analysis. p. 475–487. ISSTA 2020, Association for Computing Machinery, New York, NY , USA (2020). https://doi.org/10.1145/3395363.3397370, https: //doi.org/10.1145/3395363.3397370

  27. [27]

    In: Asian Symposium on Programming Languages and Systems

    Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Signedness- agnostic program analysis: Precise integer bounds for low-level code. In: Asian Symposium on Programming Languages and Systems. pp. 115–130. Springer (2012)

  28. [28]

    In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

    Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 27–41 (2016)

  29. [29]

    In: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Opti- mization

    Panchenko, M., Auler, R., Nell, B., Ottoni, G.: Bolt: a practical binary optimizer for data centers and beyond. In: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Opti- mization. p. 2–14. CGO 2019, IEEE Press (2019)

  30. [30]

    One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation,

    Pang, C., Yu, R., Chen, Y ., Koskinen, E., Portokalidis, G., Mao, B., Xu, J.: SoK: All you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In: 2021 IEEE Symposium on Security and Privacy (SP). pp. 833–851 (2021). https://doi.org/10.1109/SP40001.2021.00012

  31. [31]

    In: 31st USENIX Security Symposium (USENIX Security 22)

    Pang, C., Zhang, T., Yu, R., Mao, B., Xu, J.: Ground truth for binary disassembly is not easy. In: 31st USENIX Security Symposium (USENIX Security 22). pp. 2479–2495. USENIX Association, Boston, MA (Aug 2022)

  32. [32]

    Transactions of the American Mathematical society74(2), 358–366 (1953)

    Rice, H.G.: Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society74(2), 358–366 (1953)

  33. [33]

    Proceedings of the ACM on Programming Languages8(OOPSLA1), 1463–1492 (2024)

    Rose, A., Bansal, S.: Modeling dynamic (de) allocations of local mem- ory for translation validation. Proceedings of the ACM on Programming Languages8(OOPSLA1), 1463–1492 (2024)

  34. [34]

    In: Ninth Working Conference on Reverse Engineering, 2002

    Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: Ninth Working Conference on Reverse Engineering, 2002. Proceedings. pp. 45–54. IEEE (2002)

  35. [35]

    In: 2013 20th Working Conference on Reverse Engineering (WCRE)

    Smithson, M., ElWazeer, K., Anand, K., Kotha, A., Barua, R.: Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness. In: 2013 20th Working Conference on Reverse Engineering (WCRE). pp. 52–61. IEEE (2013)

  36. [36]

    In: Information Systems Security: 4th International Conference, ICISS 2008, Hyderabad, India, December 16-20, 2008

    Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: A new approach to computer security via binary analysis. In: Information Systems Security: 4th International Conference, ICISS 2008, Hyderabad, India, December 16-20, 2008. Proceedings 4. pp. 1–25. Springer (2008)

  37. [37]

    In: Pro- ceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

    Verbeek, F., Naus, N., Ravindran, B.: Verifiably correct lifting of position-independent x86-64 binaries to symbolized assembly. In: Pro- ceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. pp. 2786–2798 (2024)

  38. [38]

    In: Dempe, S., Zemkoho, A

    Verbeek, F., Olivier, P., Ravindran, B.: Sound c code decompilation for a subset of x86-64 binaries. In: Software Engineering and For- mal Methods: 18th International Conference, SEFM 2020, Amsterdam, The Netherlands, September 14–18, 2020, Proceedings. p. 247–264. Springer-Verlag, Berlin, Heidelberg (2020). https://doi.org/10.1007/978- 3-030-58768-0 14, ...

  39. [39]

    In: NDSS (2017)

    Wang, R., Shoshitaishvili, Y ., Bianchi, A., Machiry, A., Grosen, J., Grosen, P., Kruegel, C., Vigna, G.: Ramblr: Making reassembly great again. In: NDSS (2017)

  40. [40]

    In: NDSS

    Wang, T., Wei, T., Lin, Z., Zou, W.: Intscope: Automatically detecting integer overflow vulnerability in x86 binary using symbolic execution. In: NDSS. pp. 1–14 (2009)

  41. [41]

    In: Pacific-Asia Conference on Knowledge Discovery and Data Mining

    Wartell, R., Zhou, Y ., Hamlen, K.W., Kantarcioglu, M.: Shingled graph disassembly: Finding the undecideable path. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 273–285. Springer (2014)

  42. [42]

    In: Joint Euro- pean Conference on Machine Learning and Knowledge Discovery in Databases

    Wartell, R., Zhou, Y ., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Joint Euro- pean Conference on Machine Learning and Knowledge Discovery in Databases. pp. 522–536. Springer (2011)

  43. [43]

    McKay, Margaret Martonosi, and Ali Javadi- Abhari

    Williams-King, D., Kobayashi, H., Williams-King, K., Patterson, G., Spano, F., Wu, Y .J., Yang, J., Kemerlis, V .P.: Egalito: Layout- agnostic binary recompilation. In: Proceedings of the Twenty- Fifth International Conference on Architectural Support for Pro- gramming Languages and Operating Systems. p. 133–147. ASP- LOS ’20, Association for Computing Ma...

  44. [44]

    In: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

    Xie, D., Zhang, Z., Jiang, N., Xu, X., Tan, L., Zhang, X.: Resym: Har- nessing llms to recover variable and data structure symbols from stripped binaries. In: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. pp. 4554–4568 (2024)

  45. [45]

    In: 2021 IEEE Symposium on Security and Privacy (SP)

    Zhang, Z., Ye, Y ., You, W., Tao, G., Lee, W.c., Kwon, Y ., Aafer, Y ., Zhang, X.: Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary. In: 2021 IEEE Symposium on Security and Privacy (SP). pp. 813–832. IEEE (2021) APPENDIX The artifacts underlying this work are available online. We provide a Docker-based environ...