Binary code summarization: Benchmarking chatgpt/gpt- 4 and other large language models

Xin Jin, Jonathan Larson, Weiwei Yang, Zhiqiang Lin · 2023 · arXiv 2312.09601

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

cs.SE · 2026-04-09 · unverdicted · novelty 7.0

LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.

Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code

cs.CR · 2025-03-10 · unverdicted · novelty 6.0

ByteTR recovers variable types in binary code more effectively than prior methods by decoupling unbalanced type sets, mitigating compiler optimization effects via static analysis, and modeling inter-procedural data flows with a gated GNN.

Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis

cs.LG · 2025-11-14 · unverdicted · novelty 5.0

RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.

Context-Guided Decompilation: A Step Towards Re-executability

cs.SE · 2025-11-03 · unverdicted · novelty 5.0

ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.

citing papers explorer

Showing 5 of 5 citing papers.

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation cs.SE · 2026-04-09 · unverdicted · none · ref 22
LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version) cs.CR · 2026-04-30 · unverdicted · none · ref 19
REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code cs.CR · 2025-03-10 · unverdicted · none · ref 52
ByteTR recovers variable types in binary code more effectively than prior methods by decoupling unbalanced type sets, mitigating compiler optimization effects via static analysis, and modeling inter-procedural data flows with a gated GNN.
Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis cs.LG · 2025-11-14 · unverdicted · none · ref 71
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.
Context-Guided Decompilation: A Step Towards Re-executability cs.SE · 2025-11-03 · unverdicted · none · ref 29
ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.

Binary code summarization: Benchmarking chatgpt/gpt- 4 and other large language models

fields

years

verdicts

representative citing papers

citing papers explorer