SBridge: Identifying Source-to-Binary Function Similarity via Cross-Domain Control Block Matching

Hajin Yun; Heedong Yang; Jeongwoo Lee; Seunghoon Woo

arxiv: 2606.28058 · v1 · pith:7UE7MOCLnew · submitted 2026-06-26 · 💻 cs.SE

SBridge: Identifying Source-to-Binary Function Similarity via Cross-Domain Control Block Matching

Heedong Yang , Jeongwoo Lee , Hajin Yun , Seunghoon Woo This is my paper

Pith reviewed 2026-06-29 03:37 UTC · model grok-4.3

classification 💻 cs.SE

keywords source-to-binary matchingfunction similaritycontrol blocksbinary analysiscode reuse detectionvulnerability propagationstripped binariesfunction inlining

0 comments

The pith

SBridge segments functions into control blocks to match source code to binaries despite inlining and stripping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SBridge to identify which binary functions correspond to given source functions by dividing both into control blocks such as conditionals and loops. This segmentation creates a shared representation that survives compilation changes like inlining, where roughly 40 percent of functions disappear into callers. Existing methods using string literals or whole-function structures produce many mismatches; the control-block approach measures similarity at a finer grain. A reader would care because source code is easier to obtain and analyze than binaries, making vulnerability tracking in deployed software more feasible. The evaluation on thousands of real C/C++ binaries shows the method recovers the correct binary function for most source inputs.

Core claim

SBridge treats control blocks as the cross-domain unit for similarity measurement, allowing functions to be compared even when inlining merges them or when binaries lack symbols.

What carries the argument

Control block segmentation, which breaks functions into conditionals, loops and similar structures to serve as the matching representation between source and binary domains.

If this is right

Reused vulnerable code can be located in binaries by direct reference to the original source rather than compiled artifacts.
Detection remains possible on stripped binaries that lack debug information or symbol tables.
Fewer false matches occur compared with methods that compare entire functions or rely on embedded strings.
The same block-level representation supports ranking multiple candidate binaries for a single source function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same block segmentation might be tested on languages beyond C/C++ to check whether control-flow units transfer across different compiler pipelines.
If control blocks prove stable, the method could be applied to partial binaries or to code that has undergone heavy optimization passes not covered in the current evaluation.
Control-block matching could be combined with data-flow features to handle cases where control structure is altered but behavior is preserved.

Load-bearing premise

Segmenting functions into control blocks yields units that remain identifiable and comparable after compilation even when many functions are inlined.

What would settle it

A collection of source-binary pairs in which control-block sequences differ substantially after compilation yet the functions perform identical work, or pairs in which blocks match but the functions are unrelated.

Figures

Figures reproduced from arXiv: 2606.28058 by Hajin Yun, Heedong Yang, Jeongwoo Lee, Seunghoon Woo.

**Figure 3.** Figure 3: Overview of SBridge. Scope and assumption. SBridge operates regardless of whether the binary is stripped. Because of function inlining, a single source function may correspond to multiple binary functions (1-to-N matching), and conversely, multiple source functions may map to a single binary function (N-to-1 matching). Instead of identifying only the most similar binary function for a given source function… view at source ↗

**Figure 4.** Figure 4: Flow of internal branching block vector extraction and similarity comparison for the example code. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Recall@1 measurement results by architecture, compiler, optimization, and symbol management. 0.22 0.30 0.29 0.31 0.50 0.50 0.50 0.51 0.78 0.83 0.79 0.84 0 0.2 0.4 0.6 0.8 1 Recall @ 5 ARM32 ARM64 x86 x64 (a) By architecture. 0.29 0.27 0.51 0.49 0.81 0.81 0 0.2 0.4 0.6 0.8 1 Recall @ 5 GCC Clang (b) By compiler. 0.36 0.20 0.59 0.41 0.88 0.74 0 0.2 0.4 0.6 0.8 1 Recall @ 5 -O0 -O2 (c) By optimization. 0.85 0… view at source ↗

**Figure 6.** Figure 6: Recall@5 measurement results by architecture, compiler, optimization, and symbol management. is used to evaluate how highly the correct results are ranked for a given query. In our setting, the query is an input source function, and the correct result is its corresponding binary function. Result overview. Experimental results show that SBridge outperforms both the MRT-OAST and BinaryAI across most configur… view at source ↗

**Figure 7.** Figure 7: Threshold experiment and performance evaluation results. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

We present SBridge, a precise approach for identifying functions in binaries that are similar to the given source code functions. Identifying reused code in binaries is critical for security, particularly for detecting propagated vulnerabilities. Although binary-to-binary comparison is feasible, leveraging source code as the reference is more practical because source code is easier to collect and analyze directly without compilation. However, significant gaps between source and binary representations, including function inlining, create challenges in cross-domain function detection. Existing approaches primarily rely on string literals or structural similarities between entire functions, failing to capture detailed code behavior and generating many false alarms. SBridge addresses these limitations through a key innovation: control block-based function matching, which encapsulates essential functional features by segmenting functions into meaningful units such as conditionals and loops. Leveraging control blocks as a cross-domain representation, SBridge enables precise measurement of function similarity between source and binary code, effectively overcoming challenges posed by function inlining and stripped binaries. For evaluation, we collected 3,904 real-world C/C++ binaries from BinKit. In experiments identifying binary functions identical to input source functions, despite approximately 40% of binary functions being inlined, SBridge achieved 75.13% recall@1 and 80.98% recall@5, outperforming existing approaches, which achieved up to 43.31% recall@1 and 50.2% recall@

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SBridge gets 75% recall@1 with control-block matching on real binaries but leaves the inlining solution underspecified.

read the letter

Hi colleague,

The key points on SBridge are that it uses control blocks like conditionals and loops to match source functions to their binary counterparts and achieves 75.13% recall@1 and 80.98% recall@5 on 3,904 BinKit binaries despite 40% inlining, beating other methods.

The paper introduces this cross-domain representation as the main innovation over string-literal or whole-function methods. It does a good job of testing on real C/C++ binaries and providing concrete recall numbers that indicate the method can be useful for detecting reused code in security contexts.

Where it is softer is in the justification for the inlining claim. The abstract states that the control block approach overcomes inlining challenges, but it does not describe the binary block extraction process or any mechanism for handling merged blocks from inlining. The stress-test note is on point here; if the matching treats blocks as flat sequences without alignment or partial matching, the inlining rate would likely reduce the effectiveness. The central performance numbers depend on successful block correspondence that is not shown in the text. Minor issues include missing methodology details like how baselines were run and any variance in the results.

This work is aimed at people in software security and binary reverse engineering who need better source-to-binary matching. Readers interested in new representations for function similarity would get value from the idea and the evaluation setup.

It has enough of a concrete claim and data to deserve a serious referee, though it would likely need more on the matching algorithm.

I would recommend sending it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper presents SBridge, a technique for source-to-binary function similarity detection that segments functions into control blocks (conditionals, loops) as a cross-domain representation. It claims this overcomes function inlining (~40% of functions) and stripping, evaluated on 3,904 real-world C/C++ binaries from BinKit, achieving 75.13% recall@1 and 80.98% recall@5 while outperforming baselines (up to 43.31% recall@1).

Significance. If the control-block matching is shown to preserve correspondence under inlining, the work could meaningfully advance practical binary vulnerability detection by enabling direct use of source references. The scale of the BinKit evaluation is a positive factor, but the absence of verifiable methodology details limits assessment of whether the reported gains are robust.

major comments (2)

[Abstract] Abstract: the central claim that 'control block-based function matching... effectively overcoming challenges posed by function inlining' is load-bearing for the recall numbers, yet the text supplies no description of binary CFG block extraction, no mechanism for merged or flattened blocks after inlining, and no partial-match or alignment logic. Inlining changes block count and nesting, so the representation's robustness is asserted without supporting detail or evidence.
[Abstract] Abstract: the reported recall@1 (75.13%) and recall@5 (80.98%) are presented without methodology, baseline definitions, data exclusion rules, error bars, or statistical tests, rendering the performance claims unverifiable from the given text and undermining the comparison to the 43.31% baseline.

minor comments (1)

[Abstract] Abstract: the final sentence is truncated ('50.2% recall@').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications from the full paper and indicating where revisions to the abstract will strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'control block-based function matching... effectively overcoming challenges posed by function inlining' is load-bearing for the recall numbers, yet the text supplies no description of binary CFG block extraction, no mechanism for merged or flattened blocks after inlining, and no partial-match or alignment logic. Inlining changes block count and nesting, so the representation's robustness is asserted without supporting detail or evidence.

Authors: The abstract summarizes the key innovation at a high level, as is conventional. The full manuscript provides the requested details: binary CFG block extraction is described in Section 3.1 (parsing source ASTs and binary disassembly to identify control blocks for conditionals/loops), while the handling of merged/flattened blocks and changes in nesting/count due to inlining is addressed via the cross-domain alignment algorithm in Section 3.3, which performs partial sequence matching on control block features to tolerate inlining (noted as affecting ~40% of functions). We will revise the abstract to briefly note the use of alignment-based partial matching for robustness under inlining. revision: partial
Referee: [Abstract] Abstract: the reported recall@1 (75.13%) and recall@5 (80.98%) are presented without methodology, baseline definitions, data exclusion rules, error bars, or statistical tests, rendering the performance claims unverifiable from the given text and undermining the comparison to the 43.31% baseline.

Authors: The metrics are the primary results from the evaluation on the 3,904 BinKit binaries (detailed in Section 5), with baselines explicitly compared (the 43.31% recall@1 from the strongest prior method), data processing rules (including inlined/stripped functions), and experimental protocol described in that section. The abstract reports headline figures due to length limits. We will revise the abstract to reference the evaluation dataset scale, inlining rate, and baseline comparisons more explicitly; error bars and statistical tests can be incorporated if space allows in a revised version. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes an empirical method for cross-domain function similarity detection via control-block segmentation and reports recall metrics from evaluation on the BinKit dataset. No equations, first-principles derivations, fitted parameters presented as predictions, or self-citation chains appear in the abstract or description. The central claim is justified by experimental results rather than by construction or tautological reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input provides no equations, parameters, or modeling details; no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5791 in / 1066 out tokens · 38386 ms · 2026-06-29T03:37:49.611257+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 35 canonical work pages

[1]

Vector 35. 2024. Binary Ninja. https://binary.ninja/

2024
[2]

National Security Agency. 2024. Ghidra. https://ghidra-sre.org

2024
[3]

Sunwoo Ahn, Seonggwan Ahn, Hyungjoon Koo, and Yunheung Paek. 2022. Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning. InProceedings of the 38th Annual Computer Security Applications Conference. 361–374. https://doi.org/10.1145/3564625.3567975

work page doi:10.1145/3564625.3567975 2022
[4]

Gu Ban, Lili Xu, Yang Xiao, Xinhua Li, Zimu Yuan, and Wei Huo. 2021. B2SMatcher: fine-Grained version identification of open-Source software in binary files.Cybersecurity4 (2021), 1–21. https://doi.org/10.1186/s42400-021-00085-7

work page doi:10.1186/s42400-021-00085-7 2021
[5]

Martial Bourquin, Andy King, and Edward Robbins. 2013. BinSlayer: Accurate Comparison of Binary Executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. 1–10. https://doi.org/10. 1145/2430553.2430557

arXiv 2013
[6]

Ctags. 2024. Universal Ctags. https://github.com/universal-ctags/ctags

2024
[7]

Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of binaries through re-optimization. InProceedings of the 38th ACM SIGPLAN conference on programming language design and implementation. 79–94. https://doi.org/10. 1145/3140587.3062387

arXiv 2017
[8]

Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. 2017. rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. InProceedings of the 26th International Conference on Compiler Construction. 131–141

2017
[9]

Chaopeng Dong, Siyuan Li, Shougou Yang, Yang Xiao, Yongpan Wang, Hong Li, Zhi Li, and Limin Sun. 2024. LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries. InProceedings of the 46th International Conference on Software Engineering (ICSE). 791–802. https://doi.org/10.1145/3597503.3623336

work page doi:10.1145/3597503.3623336 2024
[10]

Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee. 2017. Identifying Open-Source License Violation and 1-day Security Risk at Large Scale. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(Dallas, Texas, USA)(CCS ’17). Association for Computing Machinery, New York, NY, USA, 2169–2185. https://doi.org/10.1...

work page doi:10.1145/3133956.3134048 2017
[11]

Muyue Feng, Zimu Yuan, Feng Li, Gu Ban, Yang Xiao, Shiyang Wang, Qian Tang, He Su, Chendong Yu, Jiahuan Xu, Aihua Piao, Jingling Xue, and Wei Huo. 2020. B2SFinder: Detecting Open-Source Software Reuse in COTS Software. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering(San Diego, California) (ASE ’19). IEEE Pres...

work page doi:10.1109/ase.2019.00100 2020
[12]

Debin Gao, Michael K Reiter, and Dawn Song. 2008. BinHunt: Automatically Finding Semantic Differences in Binary Programs. InInternational Conference on Information and Communications Security. Springer, 238–255. https: //doi.org/10.1007/978-3-540-88625-9_16 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 2026. SBridge: Ident...

work page doi:10.1007/978-3-540-88625-9_16 2008
[13]

Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue. 2024. Code is not natural language: unlock the power of semantics-oriented graph representation for binary code similarity detection. InProceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Associa...

2024
[14]

Xu He, Shu Wang, Pengbin Feng, Xinda Wang, Shiyu Sun, Qi Li, and Kun Sun. 2024. BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning. InProceedings of the 19th ACM Asia Conference on Computer and Communications Security. 1186–1199. https://doi.org/10.1145/3634737.3637666

work page doi:10.1145/3634737.3637666 2024
[15]

Hex-Rays. 2024. IDA Pro. https://hex-rays.com/ida-pro/

2024
[16]

IBM. 2025. Standard C Library Functions Table, By Name. https://www.ibm.com/docs/en/i/7.6.0?topic=extensions- standard-c-library-functions-table-by-name

2025
[17]

Ang Jia, Ming Fan, Wuxia Jin, Xi Xu, Zhaohui Zhou, Qiyi Tang, Sen Nie, Shi Wu, and Ting Liu. 2023. 1-to-1 or 1-to-n? Investigating the Effect of Function Inlining on Binary Similarity Analysis.ACM Transactions on Software Engineering and Methodology32, 4 (2023), 1–26. https://doi.org/10.1145/3561385

work page doi:10.1145/3561385 2023
[18]

Ang Jia, Ming Fan, Xi Xu, Wuxia Jin, Haijun Wang, and Ting Liu. 2024. Cross-Inlining Binary Function Similarity Detection. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 223, 13 pages. https://doi.org/10.1145/ 3597503.3639080

arXiv 2024
[19]

Lichen Jia, Chenggang Wu, Peihua Zhang, and Zhe Wang. 2024. CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques. InProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(Copenhagen, Denmark)(LCTES 2024). Association for Computing Machinery, New York, N...

work page doi:10.1145/3652032.3657572 2024
[20]

Ling Jiang, Junwen An, Huihui Huang, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. 2024. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article ...

work page doi:10.1145/3597503.3639100 2024
[21]

Ling Jiang, Hengchen Yuan, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. 2023. Third-party library dependency for large-scale sca in the c/c++ ecosystem: How far are we?. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1383–1395. https://doi.org/10.1145/3597926.3598143

work page doi:10.1145/3597926.3598143 2023
[22]

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2023. Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned.IEEE Transactions on Software Engineering49, 4 (2023), 1661–1682. https://doi.org/10.1109/TSE.2022.3187689

work page doi:10.1109/tse.2022.3187689 2023
[23]

Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. InProceedings of the 38th IEEE Symposium on Security and Privacy (SP). 595–614. https: //doi.org/10.1109/SP.2017.62

work page doi:10.1109/sp.2017.62 2017
[24]

Siyuan Li, Yongpan Wang, Chaopeng Dong, Shouguo Yang, Hong Li, Hao Sun, Zhe Lang, Zuxin Chen, Weijie Wang, Hongsong Zhu, and Limin Sun. 2023. LibAM: An Area Matching Framework for Detecting Third-Party Libraries in Binaries.ACM Trans. Softw. Eng. Methodol.(sep 2023). https://doi.org/10.1145/3625294

work page doi:10.1145/3625294 2023
[25]

Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. 2018. 𝛼Diff: cross-version binary code similarity detection with DNN. InProceedings of the 33rd ACM/IEEE international conference on automated software engineering. 667–678. https://doi.org/10.1145/3238147.3238199

work page doi:10.1145/3238147.3238199 2018
[26]

2025.LLVM Project Doxygen Documentation

LLVM Project. 2025.LLVM Project Doxygen Documentation. LLVM Foundation. https://llvm.org/doxygen/

2025
[27]

Stallman, Roland McGrath, Andrew Oram, and Ulrich Drepper

Sandra Loosemore, Richard M. Stallman, Roland McGrath, Andrew Oram, and Ulrich Drepper. 2025.The GNU C Library Reference Manual, for version 2.42. https://sourceware.org/glibc/manual/2.42/pdf/libc.pdf

2025
[28]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. InDetection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMV A 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16. Springer, 309–329. htt...

work page doi:10.1007/978-3-030-22038-9_15 2019
[29]

Jiang Ming, Dongpeng Xu, Yufei Jiang, and Dinghao Wu. 2017. {BinSim}: Trace-based semantic binary diffing via system call sliced segment equivalence checking. In26th USENIX Security Symposium (USENIX Security 17). Vancouver, BC, 253–270

2017
[30]

Yoonjong Na, Seunghoon Woo, Joomyeong Lee, and Heejo Lee. 2024. CNEPS: A Precise Approach for Examining Dependencies Among Third-Party C/C++ Open-Source Components. InProceedings of the 46th International Conference on Software Engineering (ICSE). 2918–2929. https://doi.org/10.1145/3597503.3639209

work page doi:10.1145/3597503.3639209 2024
[31]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. InProceedings of the 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877

work page doi:10.1145/2884781.2884877 2016
[32]

Synopsys

Synopsys 2025.2025 Open Source Security and Risk Analysis Report. Synopsys. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 2026. FSE062:22 Heedong Yang, Jeongwoo Lee, Hajin Yun, and Seunghoon Woo

arXiv 2025
[33]

Wei Tang, Ping Luo, Jialiang Fu, and Dan Zhang. 2020. LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code. In2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). 104–115. https://doi.org/10.1109/SANER48275.2020.9054845

work page doi:10.1109/saner48275.2020.9054845 2020
[34]

Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, and Dongmei Zhang. 2022. LibDB: an effective and efficient framework for detecting third-party libraries in binaries. InProceedings of the 19th International Conference on Mining Software Repositories(Pittsburgh, Pennsylvania)(MSR ’22). Association for Computing Machinery, New York, NY, USA, 423–434....

work page doi:10.1145/3524842.3528442 2022
[35]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: Jump-Aware Transformer for Binary Code Similarity. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13. https://doi.org/10.1145/3533767.3534367

work page doi:10.1145/3533767.3534367 2022
[36]

Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K Roy. 2018. CCAligner: A Token Based Large-Gap Clone Detector. InProceedings of the 40th International Conference on Software Engineering (ICSE). 1066–1077. https://doi.org/10.1145/3180155.3180179

work page doi:10.1145/3180155.3180179 2018
[37]

Seunghoon Woo, Eunjin Choi, and Heejo Lee. 2025. A large-scale analysis of the effectiveness of publicly reported security patches.Computers & Security148 (2025), 104181. https://doi.org/10.1016/j.cose.2024.104181

work page doi:10.1016/j.cose.2024.104181 2025
[38]

Seunghoon Woo, Eunjin Choi, Heejo Lee, and Hakjoo Oh. 2023. V1SCAN: Discovering 1-day Vulnerabilities in Reused C/C++ Open-source Software Components Using Code Classification Techniques. InProceedings of the 32nd USENIX Security Symposium (Security). 6541–6556

2023
[39]

Seunghoon Woo, Hyunji Hong, Eunjin Choi, and Heejo Lee. 2022. MOVERY: A Precise Approach for Modified Vulnerable Code Clone Discovery from Modified Open-Source Software Components. InProceedings of the 31st USENIX Security Symposium (Security). 3037–3053

2022
[40]

Seunghoon Woo, Dongwook Lee, Sunghan Park, Heejo Lee, and Sven Dietrich. 2021. V0Finder: Discovering the Correct Origin of Publicly Reported Software Vulnerabilities. InProceedings of the 30th USENIX Security Symposium (Security). 3041–3058

2021
[41]

Seunghoon Woo, Sunghan Park, Seulbae Kim, Heejo Lee, and Hakjoo Oh. 2021. CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse. InProceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 860–872. https://doi.org/10.1109/ICSE43902.2021.00083

work page doi:10.1109/icse43902.2021.00083 2021
[42]

Yang Xiao, Bihuan Chen, Chendong Yu, Zhengzi Xu, Zimu Yuan, Feng Li, Binghong Liu, Yang Liu, Wei Huo, Wei Zou, and Wenchang Shi. 2020. MVP: detecting vulnerabilities using patch-enhanced vulnerability signatures. InProceedings of the 29th USENIX Security Symposium (Security). 1165–1182

2020
[43]

Yang Xiao, Zhengzi Xu, Weiwei Zhang, Chendong Yu, Longquan Liu, Wei Zou, Zimu Yuan, Yang Liu, Aihua Piao, and Wei Huo. 2021. VIVA: Binary Level Vulnerability Identification via Partial Signature. In2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 213–224. https://doi.org/10.1109/SANER50967.2021.00028

work page doi:10.1109/saner50967.2021.00028 2021
[44]

Xiangzhe Xu, Shiwei Feng, Yapeng Ye, Guangyu Shen, Zian Su, Siyuan Cheng, Guanhong Tao, Qingkai Shi, Zhuo Zhang, and Xiangyu Zhang. 2023. Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1106–1118. https://doi.org/...

work page doi:10.1145/3597926.3598121 2023
[45]

Xiangzhe Xu, Zhou Xuan, Shiwei Feng, Siyuan Cheng, Yapeng Ye, Qingkai Shi, Guanhong Tao, Le Yu, Zhuo Zhang, and Xiangyu Zhang. 2023. PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Softwar...

work page doi:10.1145/3611643.3616301 2023
[46]

Xi Xu, Qinghua Zheng, Zheng Yan, Ming Fan, Ang Jia, and Ting Liu. 2021. Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model. InProceedings of the 43rd International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE43902.2021.00084

work page doi:10.1109/icse43902.2021.00084 2021
[47]

Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch Based Vulnerability Matching for Binary Programs. InProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 376–387. https://doi.org/10.1145/3395363.3397361

work page doi:10.1145/3395363.3397361 2020
[48]

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. InProceedings of the 35th IEEE Symposium on Security and Privacy (SP). IEEE, 590–604. https://doi.org/10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014
[49]

Can Yang, Zhengzi Xu, Hongxu Chen, Yang Liu, Xiaorui Gong, and Baoxu Liu. 2022. ModX: binary level partially imported third-party library detection via program modularization and semantic matching. InProceedings of the 44th International Conference on Software Engineering. 1393–1405. https://doi.org/10.1145/3510003.3510627

work page doi:10.1145/3510003.3510627 2022
[50]

Gaoqing Yu, Jing An, Jiuyang Lyu, Wei Huang, Wenqing Fan, Yixuan Cheng, and Aina Sui. 2025. CrossCode2Vec: A unified representation across source and binary functions for code similarity detection.Neurocomputing620 (2025), 129238. https://doi.org/10.1016/j.neucom.2024.129238 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 20...

work page doi:10.1016/j.neucom.2024.129238 2025
[51]

Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 1145–1152. https://doi.org/10.1609/aaai.v34i01.5466

work page doi:10.1609/aaai.v34i01.5466 2020
[52]

Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: cross-modal retrieval for function-level binary source code matching. InProceedings of the 34th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 326, 12 pages

2020
[53]

Yu, Tianchen and Yuan, Li and Lin, Liannan and He, Hongkui. 2025. A Multiple Representation Transformer with Optimized Abstract Syntax Tree for Efficient Code Clone Detection. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 587–587. https://doi.org/10.1109/ICSE55347.2025.00050

work page doi:10.1109/icse55347.2025.00050 2025
[54]

Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, and Shanping Li. 2024. Ps3: Precise patch presence test based on semantic symbolic signature. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12. https://doi.org/10.1145/3597503.3639134

work page doi:10.1145/3597503.3639134 2024
[55]

Qi Zhan, Xing Hu, Xin Xia, and Shanping Li. 2024. REACT: IR-Level Patch Presence Test for Binary. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 381–392. https://doi.org/10.1145/ 3691620.3695012

arXiv 2024
[56]

Wenyu Zhu, Hao Wang, Yuchen Zhou, Jiaming Wang, Zihan Sha, Zeyu Gao, and Chao Zhang. 2023. kTrans: Knowledge- Aware Transformer for Binary Code Embedding.arXiv preprint arXiv:2308.12659(2023). Received 2025-09-12; accepted 2025-12-22 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 2026

arXiv 2023

[1] [1]

Vector 35. 2024. Binary Ninja. https://binary.ninja/

2024

[2] [2]

National Security Agency. 2024. Ghidra. https://ghidra-sre.org

2024

[3] [3]

Sunwoo Ahn, Seonggwan Ahn, Hyungjoon Koo, and Yunheung Paek. 2022. Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning. InProceedings of the 38th Annual Computer Security Applications Conference. 361–374. https://doi.org/10.1145/3564625.3567975

work page doi:10.1145/3564625.3567975 2022

[4] [4]

Gu Ban, Lili Xu, Yang Xiao, Xinhua Li, Zimu Yuan, and Wei Huo. 2021. B2SMatcher: fine-Grained version identification of open-Source software in binary files.Cybersecurity4 (2021), 1–21. https://doi.org/10.1186/s42400-021-00085-7

work page doi:10.1186/s42400-021-00085-7 2021

[5] [5]

Martial Bourquin, Andy King, and Edward Robbins. 2013. BinSlayer: Accurate Comparison of Binary Executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. 1–10. https://doi.org/10. 1145/2430553.2430557

arXiv 2013

[6] [6]

Ctags. 2024. Universal Ctags. https://github.com/universal-ctags/ctags

2024

[7] [7]

Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of binaries through re-optimization. InProceedings of the 38th ACM SIGPLAN conference on programming language design and implementation. 79–94. https://doi.org/10. 1145/3140587.3062387

arXiv 2017

[8] [8]

Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. 2017. rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. InProceedings of the 26th International Conference on Compiler Construction. 131–141

2017

[9] [9]

Chaopeng Dong, Siyuan Li, Shougou Yang, Yang Xiao, Yongpan Wang, Hong Li, Zhi Li, and Limin Sun. 2024. LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries. InProceedings of the 46th International Conference on Software Engineering (ICSE). 791–802. https://doi.org/10.1145/3597503.3623336

work page doi:10.1145/3597503.3623336 2024

[10] [10]

Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee. 2017. Identifying Open-Source License Violation and 1-day Security Risk at Large Scale. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(Dallas, Texas, USA)(CCS ’17). Association for Computing Machinery, New York, NY, USA, 2169–2185. https://doi.org/10.1...

work page doi:10.1145/3133956.3134048 2017

[11] [11]

Muyue Feng, Zimu Yuan, Feng Li, Gu Ban, Yang Xiao, Shiyang Wang, Qian Tang, He Su, Chendong Yu, Jiahuan Xu, Aihua Piao, Jingling Xue, and Wei Huo. 2020. B2SFinder: Detecting Open-Source Software Reuse in COTS Software. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering(San Diego, California) (ASE ’19). IEEE Pres...

work page doi:10.1109/ase.2019.00100 2020

[12] [12]

Debin Gao, Michael K Reiter, and Dawn Song. 2008. BinHunt: Automatically Finding Semantic Differences in Binary Programs. InInternational Conference on Information and Communications Security. Springer, 238–255. https: //doi.org/10.1007/978-3-540-88625-9_16 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 2026. SBridge: Ident...

work page doi:10.1007/978-3-540-88625-9_16 2008

[13] [13]

Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue. 2024. Code is not natural language: unlock the power of semantics-oriented graph representation for binary code similarity detection. InProceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Associa...

2024

[14] [14]

Xu He, Shu Wang, Pengbin Feng, Xinda Wang, Shiyu Sun, Qi Li, and Kun Sun. 2024. BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning. InProceedings of the 19th ACM Asia Conference on Computer and Communications Security. 1186–1199. https://doi.org/10.1145/3634737.3637666

work page doi:10.1145/3634737.3637666 2024

[15] [15]

Hex-Rays. 2024. IDA Pro. https://hex-rays.com/ida-pro/

2024

[16] [16]

IBM. 2025. Standard C Library Functions Table, By Name. https://www.ibm.com/docs/en/i/7.6.0?topic=extensions- standard-c-library-functions-table-by-name

2025

[17] [17]

Ang Jia, Ming Fan, Wuxia Jin, Xi Xu, Zhaohui Zhou, Qiyi Tang, Sen Nie, Shi Wu, and Ting Liu. 2023. 1-to-1 or 1-to-n? Investigating the Effect of Function Inlining on Binary Similarity Analysis.ACM Transactions on Software Engineering and Methodology32, 4 (2023), 1–26. https://doi.org/10.1145/3561385

work page doi:10.1145/3561385 2023

[18] [18]

Ang Jia, Ming Fan, Xi Xu, Wuxia Jin, Haijun Wang, and Ting Liu. 2024. Cross-Inlining Binary Function Similarity Detection. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 223, 13 pages. https://doi.org/10.1145/ 3597503.3639080

arXiv 2024

[19] [19]

Lichen Jia, Chenggang Wu, Peihua Zhang, and Zhe Wang. 2024. CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques. InProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(Copenhagen, Denmark)(LCTES 2024). Association for Computing Machinery, New York, N...

work page doi:10.1145/3652032.3657572 2024

[20] [20]

Ling Jiang, Junwen An, Huihui Huang, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. 2024. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article ...

work page doi:10.1145/3597503.3639100 2024

[21] [21]

Ling Jiang, Hengchen Yuan, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. 2023. Third-party library dependency for large-scale sca in the c/c++ ecosystem: How far are we?. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1383–1395. https://doi.org/10.1145/3597926.3598143

work page doi:10.1145/3597926.3598143 2023

[22] [22]

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2023. Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned.IEEE Transactions on Software Engineering49, 4 (2023), 1661–1682. https://doi.org/10.1109/TSE.2022.3187689

work page doi:10.1109/tse.2022.3187689 2023

[23] [23]

Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. InProceedings of the 38th IEEE Symposium on Security and Privacy (SP). 595–614. https: //doi.org/10.1109/SP.2017.62

work page doi:10.1109/sp.2017.62 2017

[24] [24]

Siyuan Li, Yongpan Wang, Chaopeng Dong, Shouguo Yang, Hong Li, Hao Sun, Zhe Lang, Zuxin Chen, Weijie Wang, Hongsong Zhu, and Limin Sun. 2023. LibAM: An Area Matching Framework for Detecting Third-Party Libraries in Binaries.ACM Trans. Softw. Eng. Methodol.(sep 2023). https://doi.org/10.1145/3625294

work page doi:10.1145/3625294 2023

[25] [25]

Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. 2018. 𝛼Diff: cross-version binary code similarity detection with DNN. InProceedings of the 33rd ACM/IEEE international conference on automated software engineering. 667–678. https://doi.org/10.1145/3238147.3238199

work page doi:10.1145/3238147.3238199 2018

[26] [26]

2025.LLVM Project Doxygen Documentation

LLVM Project. 2025.LLVM Project Doxygen Documentation. LLVM Foundation. https://llvm.org/doxygen/

2025

[27] [27]

Stallman, Roland McGrath, Andrew Oram, and Ulrich Drepper

Sandra Loosemore, Richard M. Stallman, Roland McGrath, Andrew Oram, and Ulrich Drepper. 2025.The GNU C Library Reference Manual, for version 2.42. https://sourceware.org/glibc/manual/2.42/pdf/libc.pdf

2025

[28] [28]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. InDetection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMV A 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16. Springer, 309–329. htt...

work page doi:10.1007/978-3-030-22038-9_15 2019

[29] [29]

Jiang Ming, Dongpeng Xu, Yufei Jiang, and Dinghao Wu. 2017. {BinSim}: Trace-based semantic binary diffing via system call sliced segment equivalence checking. In26th USENIX Security Symposium (USENIX Security 17). Vancouver, BC, 253–270

2017

[30] [30]

Yoonjong Na, Seunghoon Woo, Joomyeong Lee, and Heejo Lee. 2024. CNEPS: A Precise Approach for Examining Dependencies Among Third-Party C/C++ Open-Source Components. InProceedings of the 46th International Conference on Software Engineering (ICSE). 2918–2929. https://doi.org/10.1145/3597503.3639209

work page doi:10.1145/3597503.3639209 2024

[31] [31]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. InProceedings of the 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877

work page doi:10.1145/2884781.2884877 2016

[32] [32]

Synopsys

Synopsys 2025.2025 Open Source Security and Risk Analysis Report. Synopsys. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 2026. FSE062:22 Heedong Yang, Jeongwoo Lee, Hajin Yun, and Seunghoon Woo

arXiv 2025

[33] [33]

Wei Tang, Ping Luo, Jialiang Fu, and Dan Zhang. 2020. LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code. In2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). 104–115. https://doi.org/10.1109/SANER48275.2020.9054845

work page doi:10.1109/saner48275.2020.9054845 2020

[34] [34]

Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, and Dongmei Zhang. 2022. LibDB: an effective and efficient framework for detecting third-party libraries in binaries. InProceedings of the 19th International Conference on Mining Software Repositories(Pittsburgh, Pennsylvania)(MSR ’22). Association for Computing Machinery, New York, NY, USA, 423–434....

work page doi:10.1145/3524842.3528442 2022

[35] [35]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: Jump-Aware Transformer for Binary Code Similarity. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13. https://doi.org/10.1145/3533767.3534367

work page doi:10.1145/3533767.3534367 2022

[36] [36]

Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K Roy. 2018. CCAligner: A Token Based Large-Gap Clone Detector. InProceedings of the 40th International Conference on Software Engineering (ICSE). 1066–1077. https://doi.org/10.1145/3180155.3180179

work page doi:10.1145/3180155.3180179 2018

[37] [37]

Seunghoon Woo, Eunjin Choi, and Heejo Lee. 2025. A large-scale analysis of the effectiveness of publicly reported security patches.Computers & Security148 (2025), 104181. https://doi.org/10.1016/j.cose.2024.104181

work page doi:10.1016/j.cose.2024.104181 2025

[38] [38]

Seunghoon Woo, Eunjin Choi, Heejo Lee, and Hakjoo Oh. 2023. V1SCAN: Discovering 1-day Vulnerabilities in Reused C/C++ Open-source Software Components Using Code Classification Techniques. InProceedings of the 32nd USENIX Security Symposium (Security). 6541–6556

2023

[39] [39]

Seunghoon Woo, Hyunji Hong, Eunjin Choi, and Heejo Lee. 2022. MOVERY: A Precise Approach for Modified Vulnerable Code Clone Discovery from Modified Open-Source Software Components. InProceedings of the 31st USENIX Security Symposium (Security). 3037–3053

2022

[40] [40]

Seunghoon Woo, Dongwook Lee, Sunghan Park, Heejo Lee, and Sven Dietrich. 2021. V0Finder: Discovering the Correct Origin of Publicly Reported Software Vulnerabilities. InProceedings of the 30th USENIX Security Symposium (Security). 3041–3058

2021

[41] [41]

Seunghoon Woo, Sunghan Park, Seulbae Kim, Heejo Lee, and Hakjoo Oh. 2021. CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse. InProceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 860–872. https://doi.org/10.1109/ICSE43902.2021.00083

work page doi:10.1109/icse43902.2021.00083 2021

[42] [42]

Yang Xiao, Bihuan Chen, Chendong Yu, Zhengzi Xu, Zimu Yuan, Feng Li, Binghong Liu, Yang Liu, Wei Huo, Wei Zou, and Wenchang Shi. 2020. MVP: detecting vulnerabilities using patch-enhanced vulnerability signatures. InProceedings of the 29th USENIX Security Symposium (Security). 1165–1182

2020

[43] [43]

Yang Xiao, Zhengzi Xu, Weiwei Zhang, Chendong Yu, Longquan Liu, Wei Zou, Zimu Yuan, Yang Liu, Aihua Piao, and Wei Huo. 2021. VIVA: Binary Level Vulnerability Identification via Partial Signature. In2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 213–224. https://doi.org/10.1109/SANER50967.2021.00028

work page doi:10.1109/saner50967.2021.00028 2021

[44] [44]

Xiangzhe Xu, Shiwei Feng, Yapeng Ye, Guangyu Shen, Zian Su, Siyuan Cheng, Guanhong Tao, Qingkai Shi, Zhuo Zhang, and Xiangyu Zhang. 2023. Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1106–1118. https://doi.org/...

work page doi:10.1145/3597926.3598121 2023

[45] [45]

Xiangzhe Xu, Zhou Xuan, Shiwei Feng, Siyuan Cheng, Yapeng Ye, Qingkai Shi, Guanhong Tao, Le Yu, Zhuo Zhang, and Xiangyu Zhang. 2023. PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Softwar...

work page doi:10.1145/3611643.3616301 2023

[46] [46]

Xi Xu, Qinghua Zheng, Zheng Yan, Ming Fan, Ang Jia, and Ting Liu. 2021. Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model. InProceedings of the 43rd International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE43902.2021.00084

work page doi:10.1109/icse43902.2021.00084 2021

[47] [47]

Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch Based Vulnerability Matching for Binary Programs. InProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 376–387. https://doi.org/10.1145/3395363.3397361

work page doi:10.1145/3395363.3397361 2020

[48] [48]

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. InProceedings of the 35th IEEE Symposium on Security and Privacy (SP). IEEE, 590–604. https://doi.org/10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014

[49] [49]

Can Yang, Zhengzi Xu, Hongxu Chen, Yang Liu, Xiaorui Gong, and Baoxu Liu. 2022. ModX: binary level partially imported third-party library detection via program modularization and semantic matching. InProceedings of the 44th International Conference on Software Engineering. 1393–1405. https://doi.org/10.1145/3510003.3510627

work page doi:10.1145/3510003.3510627 2022

[50] [50]

Gaoqing Yu, Jing An, Jiuyang Lyu, Wei Huang, Wenqing Fan, Yixuan Cheng, and Aina Sui. 2025. CrossCode2Vec: A unified representation across source and binary functions for code similarity detection.Neurocomputing620 (2025), 129238. https://doi.org/10.1016/j.neucom.2024.129238 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 20...

work page doi:10.1016/j.neucom.2024.129238 2025

[51] [51]

Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 1145–1152. https://doi.org/10.1609/aaai.v34i01.5466

work page doi:10.1609/aaai.v34i01.5466 2020

[52] [52]

Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: cross-modal retrieval for function-level binary source code matching. InProceedings of the 34th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 326, 12 pages

2020

[53] [53]

Yu, Tianchen and Yuan, Li and Lin, Liannan and He, Hongkui. 2025. A Multiple Representation Transformer with Optimized Abstract Syntax Tree for Efficient Code Clone Detection. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 587–587. https://doi.org/10.1109/ICSE55347.2025.00050

work page doi:10.1109/icse55347.2025.00050 2025

[54] [54]

Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, and Shanping Li. 2024. Ps3: Precise patch presence test based on semantic symbolic signature. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12. https://doi.org/10.1145/3597503.3639134

work page doi:10.1145/3597503.3639134 2024

[55] [55]

Qi Zhan, Xing Hu, Xin Xia, and Shanping Li. 2024. REACT: IR-Level Patch Presence Test for Binary. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 381–392. https://doi.org/10.1145/ 3691620.3695012

arXiv 2024

[56] [56]

Wenyu Zhu, Hao Wang, Yuchen Zhou, Jiaming Wang, Zihan Sha, Zeyu Gao, and Chao Zhang. 2023. kTrans: Knowledge- Aware Transformer for Binary Code Embedding.arXiv preprint arXiv:2308.12659(2023). Received 2025-09-12; accepted 2025-12-22 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE062. Publication date: July 2026

arXiv 2023