Acoda: Adversarial Code Obfuscation for Defending against LLM-based Analysis

Haodong Li; Haoyu Wang; Hongzhou Rao; Yanjie Zhao; Zikan Dong

arxiv: 2606.11755 · v1 · pith:OIVUGLOZnew · submitted 2026-06-10 · 💻 cs.SE

Acoda: Adversarial Code Obfuscation for Defending against LLM-based Analysis

Hongzhou Rao , Zikan Dong , Yanjie Zhao , Haodong Li , Haoyu Wang This is my paper

Pith reviewed 2026-06-27 09:24 UTC · model grok-4.3

classification 💻 cs.SE

keywords adversarial code obfuscationLLM defensecode protectiongenetic algorithmsoftware securityintellectual propertycode analysis

0 comments

The pith

Acoda uses a genetic algorithm to generate semantics-preserving obfuscations that make LLMs refuse or misinterpret code analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that code can be systematically altered so large language models fail at their usual tasks of understanding, debugging, or detecting issues in it. This matters because LLMs are now routinely applied to code in ways that could expose proprietary logic or intellectual property. Acoda creates eight specific obfuscation transformations that respect how LLMs align for safety and break text into tokens, then lets a genetic algorithm search for the most effective combinations. Tests across seven current LLMs show the transformed code triggers refusal or error rates up to 70 percent while leaving the program's observable behavior identical to the original.

Core claim

Acoda is a genetic algorithm-based adversarial code obfuscation framework that leverages LLMs' safety alignment and token-based information processing to generate semantics-preserving obfuscated code. By iteratively optimizing obfuscation strategies, it produces adversarial samples that maximize the rate at which target LLMs refuse or misinterpret code analysis tasks. On seven state-of-the-art LLMs, this yields an attack success rate of up to 70 percent with cross-model transferability and negligible runtime cost.

What carries the argument

Genetic algorithm that iteratively selects and combines eight semantics-preserving obfuscation methods designed to exploit LLMs' safety alignment and token-level processing.

If this is right

Target LLMs refuse to analyze the obfuscated code or produce incorrect interpretations of its logic.
The transformed code executes with identical observable behavior to the original source.
Obfuscation strategies transfer effectively from one LLM to others.
The generation process adds only minimal extra computation time.
A new defense option exists for protecting code against automated LLM-driven reverse engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar search-based obfuscation could be adapted to shield code from other semantic AI tools beyond LLMs.
An ongoing competition may develop between improved obfuscators and stronger LLM analyzers.
The technique offers a practical way for developers to limit exposure when sharing code with AI-assisted services.
Layering this method with conventional obfuscators might raise the bar for any automated code inspection.

Load-bearing premise

An auxiliary LLM plus four response metrics can accurately and without bias determine when a target LLM has refused or misinterpreted the obfuscated code.

What would settle it

Run the original program and its Acoda-obfuscated version on the same set of inputs and check whether every output matches exactly.

Figures

Figures reproduced from arXiv: 2606.11755 by Haodong Li, Haoyu Wang, Hongzhou Rao, Yanjie Zhao, Zikan Dong.

**Figure 1.** Figure 1: An example of vulnerability code. and let the LLM just return the original code. After embedding this block into ten selected samples and prompting both DS-Coder6.7B and CodeLlama-7B to analyze each sample three times, we observed that both LLMs produced refusal responses among their 30 outputs per LLM. Although the refusal ratio was below 50%, this result supports the feasibility of triggering the safety… view at source ↗

**Figure 2.** Figure 2: An example of using special tokens to affect LLM [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An example of Vulnerability Code Injection. then embed the misleading description into the code, aiming to make the LLM misinterpret the code’s behavior during analysis. 1 # Original code: 2 def add(a, b): 3 return a + b 4 # Adding Misleading Summarization: 5 def add(a, b): 6 # This code calculates the product of a and b. 7 return a + b [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: An example of Misleading Summarization. String Obfuscation: As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: An example of String Obfuscation. Try–Except Wrapping: As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: An example of Try–Except Wrapping. EOS Token Insertion: As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: An example of EOS Token Insertion. 3.2 The Acoda Framework We now present our adversarial framework, Acoda, whose overall workflow is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: The workflow of Acoda. generation time. Therefore, we select 3 target LLMs as a balance between effectiveness and efficiency. 3.2.3 Quantitative Analysis. After obtaining the deobfuscation results, we conduct a quantitative analysis. To achieve this, we adopt four metrics proposed in [3], with modifications in their computation. The definitions of these metrics are as follows: • Syntax Score (SyS): This … view at source ↗

**Figure 9.** Figure 9: The workflow of benchmark construction Next, we construct the benchmark based on CodeNet. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Transferability results of adversarial samples. [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Case study: the DS-R1 identifies obfuscated SQL [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Ablation study of obfuscation method effective [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

read the original abstract

With the widespread adoption of Large Language Models (LLMs) in software engineering (SE) tasks such as code understanding, debugging, and vulnerability detection, their powerful semantic reasoning ability has also introduced new security and privacy risks. LLMs can analyze, reconstruct, or even reverse-engineer source code logic, potentially leading to the leakage of intellectual property. To address this issue, we propose Acoda, a genetic algorithm-based adversarial code obfuscation framework that defends against LLM-based code analysis. Acoda leverages two key mechanisms of LLMs, namely safety alignment and token-based information processing, to design 8 semantics-preserving obfuscation methods. It iteratively optimizes obfuscation strategies through a genetic algorithm to generate adversarial samples that maximize defensive effectiveness. In addition, we propose a quantitative evaluation framework based on LLM responses, which combines an auxiliary LLM and four evaluation metrics to assess how target LLMs analyze obfuscated code comprehensively. Experimental results show that Acoda can effectively induce LLMs to refuse or misinterpret code analysis. On 7 state-of-the-art LLMs, including GPT-4o, DeepSeek, Qwen, Llama, and Gemma, Acoda achieves an attack success rate (ASR) of up to 70%, with strong cross-model transferability and minimal runtime overhead, while ensuring that the semantics of the original code remain unchanged. Overall, this study provides a new perspective for code protection and LLM security defense in the era of LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Acoda applies a genetic algorithm to eight LLM-targeted obfuscations for code protection, but its auxiliary-LLM evaluation framework lacks the controls needed to trust the 70% ASR numbers.

read the letter

Acoda uses a genetic algorithm to combine eight obfuscation methods built around LLM safety alignment and token processing. The goal is to generate adversarial code samples that make LLMs refuse or misread the logic while keeping original semantics intact. It reports up to 70% attack success rate on seven models including GPT-4o and Llama, plus cross-model transfer and low overhead.

The framing that links specific LLM mechanisms to the choice of obfuscations and then optimizes them with a GA is the clearest new element. Earlier obfuscation work in software engineering did not target these LLM internals in the same way.

The evaluation is the main soft spot. The paper scores target-model responses with an auxiliary LLM and four metrics. No human inter-rater checks or swaps to a dissimilar auxiliary model are described, so it is unclear whether the refusal and misinterpretation judgments are independent. The semantics-preservation claim also rests on this same framework.

The work is aimed at researchers in software security and LLM robustness who need practical defenses for code IP. The idea is concrete enough to merit referee time, but the authors will need to add validation for the scoring method and more detail on the eight techniques and fitness function.

I would send it to peer review with requests focused on the evaluation controls.

Referee Report

2 major / 1 minor

Summary. The paper proposes Acoda, a genetic algorithm-based framework for generating adversarial code obfuscations. It designs eight semantics-preserving obfuscation methods that exploit LLM safety alignment and token processing, then uses a GA to optimize combinations that maximize defensive effectiveness against LLM code analysis. A quantitative evaluation framework employing an auxiliary LLM and four metrics is introduced to measure attack success rate (ASR), with experiments on seven LLMs (GPT-4o, DeepSeek, Qwen, Llama, Gemma) reporting up to 70% ASR, cross-model transferability, minimal overhead, and unchanged semantics.

Significance. If the central claims hold under rigorous validation, the work provides a concrete, optimization-driven defense for protecting source code IP against LLM-based reverse engineering in SE tasks. The combination of GA search with multi-metric LLM-as-judge evaluation and explicit cross-model testing represents a practical contribution to adversarial robustness in code analysis.

major comments (2)

[Abstract / Evaluation Framework] Abstract and Evaluation Framework: The reported ASR of up to 70% and cross-model transferability rest entirely on judgments from an auxiliary LLM using four unspecified metrics for refusal/misinterpretation. No human inter-rater agreement, ablation on auxiliary-model choice, or checks for shared training data/alignment bias with target models are described, making the quantitative results vulnerable to circularity and preventing independent verification of the central claim.
[Methods] Methods: Concrete definitions of the eight obfuscation methods, the GA fitness function (including how semantic preservation and defensive effectiveness are quantified), baseline comparisons, and experimental controls (dataset size, number of generations, mutation/crossover rates) are not supplied in sufficient detail to reproduce or assess the optimization process and the 70% ASR result.

minor comments (1)

[Abstract] The abstract states that semantics remain unchanged but provides no explicit metric or verification procedure (e.g., test-suite equivalence or diff-based checks) used to enforce this constraint during GA evolution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the concerns about evaluation rigor and methodological detail, thereby improving reproducibility and mitigating risks of bias in the reported results.

read point-by-point responses

Referee: [Abstract / Evaluation Framework] Abstract and Evaluation Framework: The reported ASR of up to 70% and cross-model transferability rest entirely on judgments from an auxiliary LLM using four unspecified metrics for refusal/misinterpretation. No human inter-rater agreement, ablation on auxiliary-model choice, or checks for shared training data/alignment bias with target models are described, making the quantitative results vulnerable to circularity and preventing independent verification of the central claim.

Authors: We agree this is a substantive limitation in the current presentation. In the revised version we will explicitly name and define the four metrics, report human inter-rater agreement on a sampled subset of judgments, add an ablation varying the auxiliary LLM, and include a discussion of steps taken to reduce shared-data/alignment bias (e.g., use of models from distinct providers and families). These changes will strengthen the central claim. revision: yes
Referee: [Methods] Methods: Concrete definitions of the eight obfuscation methods, the GA fitness function (including how semantic preservation and defensive effectiveness are quantified), baseline comparisons, and experimental controls (dataset size, number of generations, mutation/crossover rates) are not supplied in sufficient detail to reproduce or assess the optimization process and the 70% ASR result.

Authors: We concur that the Methods section currently lacks the granularity needed for reproduction. The revision will supply: explicit definitions and pseudocode for each of the eight obfuscation transformations; the exact GA fitness function with formulas quantifying semantic preservation (via automated test-suite equivalence) and defensive effectiveness; comparisons to standard obfuscation baselines; and all experimental hyperparameters (dataset sizes, generations, population size, mutation/crossover rates, and selection criteria). revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; empirical evaluation framework is self-contained

full rationale

The paper describes an empirical genetic-algorithm obfuscation method and an auxiliary-LLM evaluation protocol with four metrics, but contains no equations, fitted parameters presented as predictions, self-citation load-bearing uniqueness theorems, or ansatzes smuggled via prior work. All reported ASR, transferability, and semantic-preservation numbers are experimental outcomes on external target models rather than quantities derived by construction from the paper's own definitions or inputs. The evaluation framework is an experimental design choice whose validity is open to external critique, yet it does not reduce the central claims to a self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; full paper not available so ledger is limited to claims stated in the abstract.

axioms (2)

domain assumption The eight obfuscation methods preserve original code semantics
Stated directly in the abstract as a requirement for the defense.
domain assumption LLM responses can be reliably scored for refusal or misinterpretation using an auxiliary LLM and four metrics
Core to the proposed quantitative evaluation framework.

pith-pipeline@v0.9.1-grok · 5804 in / 1275 out tokens · 19132 ms · 2026-06-27T09:24:43.597704+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 14 canonical work pages · 2 internal anchors

[1]

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. 2022. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

David Beste, Grégoire Menguy, Hossein Hajipour, Mario Fritz, Antonio Emanuele Cinà, Sébastien Bardin, Thorsten Holz, Thorsten Eisenhofer, and Lea Schönherr
[3]

InInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Exploring the Potential of LLMs for Code Deobfuscation. InInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 267–286
[4]

Guoqiang Chen, Xin Jin, and Zhiqiang Lin. 2025. JsDeObsBench: Measur- ing and Benchmarking LLMs for JavaScript Deobfuscation.arXiv preprint arXiv:2506.20170(2025)

work page arXiv 2025
[5]

Zimin Chen, Sen Fang, and Martin Monperrus. 2024. Supersonic: Learning to generate source code optimizations in C/C++.IEEE Transactions on Software Engineering(2024)

2024
[6]

Byunggeon Choi, Hongjoo Jin, Dong Hoon Lee, and Wonsuk Choi. 2024. Chat- DEOB: An Effective Deobfuscation Method Based on Large Language Model. In International Conference on Information Security Applications. Springer, 151–163

2024
[7]

Christian Collberg. [n. d.]. Tigress: Transformations for C Programs. https: //tigress.wtf/. Accessed: 2025-01-10

2025
[8]

Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, et al. 2024. Vul-rag: Enhanc- ing llm-based vulnerability detection via knowledge-level rag.arXiv preprint arXiv:2406.11147(2024)

work page arXiv 2024
[9]

Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, and Shu- vendu K Lahiri. 2024. Llm-based test-driven interactive code generation: User study and empirical evaluation.IEEE Transactions on Software Engineering(2024)

2024
[10]

Guardsquare. [n. d.]. ProGuard. https://www .guardsquare.com/proguard. Ac- cessed: 2025-01-10

2025
[11]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. Deepseek-r1 incen- tivizes reasoning in llms through reinforcement learning.Nature645, 8081 (2025), 633–638

2025
[12]

Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, and Fran Casino. 2024. Outside the comfort zone: Analysing llm capabilities in software vulnerability detection. InEuropean symposium on research in computer security. Springer, 271–289

2024
[13]

Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, and Yossi Adi. 2024. The larger the better? improved llm code-generation via budget reallocation. arXiv preprint arXiv:2404.00725(2024)

work page arXiv 2024
[14]

Jingxuan He and Martin Vechev. 2023. Large language models for code: Secu- rity hardening and adversarial testing. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879

2023
[15]

Jingxuan He, Mark Vero, Gabriela Krasnopolska, and Martin Vechev. 2024. In- struction tuning for secure code generation.arXiv preprint arXiv:2402.09497 (2024)

work page arXiv 2024
[16]

Peiwei Hu, Ruigang Liang, and Kai Chen. 2024. Degpt: Optimizing decompiler output with llm. InProceedings 2024 Network and Distributed System Security Symposium, Vol. 267622140

2024
[17]

JavaScript Obfuscator Contributors. [n. d.]. JavaScript Obfuscator. https: //github.com/javascript-obfuscator/javascript-obfuscator. Accessed: 2025-01-10

2025
[18]

Shan Jiang, Pranoy Kovuri, David Tao, and Zhixun Tan. 2025. Cascade: Llm- powered javascript deobfuscator at google.arXiv preprint arXiv:2507.17691(2025)

work page arXiv 2025
[19]

Sathvik Joel, Jie Wu, and Fatemeh Fard. 2024. A survey on llm-based code generation for low-resource and domain-specific programming languages.ACM Transactions on Software Engineering and Methodology(2024)

2024
[20]

Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample
[21]

Advances in Neural Information Processing Systems34 (2021), 14967–14979

DOBF: A deobfuscation pre-training objective for programming languages. Advances in Neural Information Processing Systems34 (2021), 14967–14979

2021
[22]

Dong Li, Meng Yan, Yaosheng Zhang, Zhongxin Liu, Chao Liu, Xiaohong Zhang, Ting Chen, and David Lo. 2024. CoSec: On-the-Fly security hardening of code LLMs via supervised co-decoding. InProceedings of the 33rd ACM SIGSOFT Inter- national Symposium on Software Testing and Analysis. 1428–1439

2024
[23]

Yalan Lin, Chengcheng Wan, Yixiong Fang, and Xiaodong Gu. 2024. CodeCipher: Learning to Obfuscate Source Code Against LLMs.arXiv preprint arXiv:2410.05797 (2024)

work page arXiv 2024
[24]

Guilong Lu, Xiaolin Ju, Xiang Chen, Wenlong Pei, and Zhilong Cai. 2024. GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning.Journal of Systems and Software212 (2024), 112031

2024
[25]

Meta Platforms, Inc. [n. d.]. LLaMA Responsible Use Guide. https:// www.llama.com/docs/how-to-guides/responsible-use-guide-resources. Ac- cessed: 2025-01-10

2025
[26]

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an llm to help with code understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

2024
[27]

Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, and NhatHai Phan. 2024. Promsec: Prompt optimization for secure generation of functional source code with large language models (llms). InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2266–2280

2024
[28]

Yu Nong, Richard Fang, Guangbei Yi, Kunsong Zhao, Xiapu Luo, Feng Chen, and Haipeng Cai. 2024. Vgx: Large-scale sample generation for boosting learning- based software vulnerability analyses. InProceedings of the IEEE/ACM 46th Inter- national Conference on Software Engineering. 1–13

2024
[29]

Obfuscator-LLVM Contributors. [n. d.]. Obfuscator-LLVM. https://github .com/ obfuscator-llvm/obfuscator. Accessed: 2025-01-10

2025
[30]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

2022
[31]

Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al
[32]

Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks.arXiv preprint arXiv:2105.12655(2021)

work page arXiv 2021
[33]

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundare- san, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[34]

CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A Choquette-Choo, Jingyue Shen, Joe Kelley, et al. 2024. Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409(2024)

work page arXiv 2024
[35]

Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, et al. 2024. Debugbench: Evaluating debugging capability of large language models.arXiv preprint arXiv:2401.04621 (2024)

work page arXiv 2024
[36]

Anton Tkachenko, Dmitrij Suskevic, and Benjamin Adolphi. 2025. Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities.arXiv preprint arXiv:2505.19887 (2025)

work page arXiv 2025
[37]

Tree-sitter Contributors. [n. d.]. Tree-sitter. https://tree-sitter .github.io/tree- sitter/. Accessed: 2025-01-10

2025
[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017
[39]

Nalin Wadhwa, Jui Pradhan, Atharv Sonwane, Surya Prakash Sahu, Nagarajan Natarajan, Aditya Kanade, Suresh Parthasarathy, and Sriram Rajamani. 2024. Core: Resolving code quality issues using llms.Proceedings of the ACM on Software Engineering1, FSE (2024), 789–811

2024
[40]

Alperen Yildiz, Sin G Teo, Yiling Lou, Yebo Feng, Chong Wang, and Dinil M Divakaran. 2025. Benchmarking LLMs and LLM-based Agents in Practical Vul- nerability Detection for Code Repositories.arXiv preprint arXiv:2503.03586(2025)

work page arXiv 2025
[41]

JD Zamfirescu-Pereira, Eunice Jun, Michael Terry, Qian Yang, and Björn Hart- mann. 2025. Beyond code generation: Llm-supported exploration of the program design space. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–17

2025
[42]

Boyu Zhang, Tianyu Du, Junkai Tong, Xuhong Zhang, Kingsum Chow, Sheng Cheng, Xun Wang, and Jianwei Yin. 2024. SecCoder: Towards Generalizable and Robust Secure Code Generation.arXiv preprint arXiv:2410.01488(2024)

work page arXiv 2024
[43]

Huangzhao Zhang, Kechi Zhang, Zhuo Li, Jia Li, Jia Li, Yongmin Li, Yunfei Zhao, Yuqi Zhu, Fang Liu, Ge Li, et al. 2024. Deep learning for code generation: a survey. Science China Information Sciences67, 9 (2024), 191101

2024
[44]

Yichi Zhang. 2024. Detecting code comment inconsistencies using llm and pro- gram analysis. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 683–685

2024

[1] [1]

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. 2022. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

David Beste, Grégoire Menguy, Hossein Hajipour, Mario Fritz, Antonio Emanuele Cinà, Sébastien Bardin, Thorsten Holz, Thorsten Eisenhofer, and Lea Schönherr

[3] [3]

InInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Exploring the Potential of LLMs for Code Deobfuscation. InInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 267–286

[4] [4]

Guoqiang Chen, Xin Jin, and Zhiqiang Lin. 2025. JsDeObsBench: Measur- ing and Benchmarking LLMs for JavaScript Deobfuscation.arXiv preprint arXiv:2506.20170(2025)

work page arXiv 2025

[5] [5]

Zimin Chen, Sen Fang, and Martin Monperrus. 2024. Supersonic: Learning to generate source code optimizations in C/C++.IEEE Transactions on Software Engineering(2024)

2024

[6] [6]

Byunggeon Choi, Hongjoo Jin, Dong Hoon Lee, and Wonsuk Choi. 2024. Chat- DEOB: An Effective Deobfuscation Method Based on Large Language Model. In International Conference on Information Security Applications. Springer, 151–163

2024

[7] [7]

Christian Collberg. [n. d.]. Tigress: Transformations for C Programs. https: //tigress.wtf/. Accessed: 2025-01-10

2025

[8] [8]

Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, et al. 2024. Vul-rag: Enhanc- ing llm-based vulnerability detection via knowledge-level rag.arXiv preprint arXiv:2406.11147(2024)

work page arXiv 2024

[9] [9]

Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, and Shu- vendu K Lahiri. 2024. Llm-based test-driven interactive code generation: User study and empirical evaluation.IEEE Transactions on Software Engineering(2024)

2024

[10] [10]

Guardsquare. [n. d.]. ProGuard. https://www .guardsquare.com/proguard. Ac- cessed: 2025-01-10

2025

[11] [11]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. Deepseek-r1 incen- tivizes reasoning in llms through reinforcement learning.Nature645, 8081 (2025), 633–638

2025

[12] [12]

Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, and Fran Casino. 2024. Outside the comfort zone: Analysing llm capabilities in software vulnerability detection. InEuropean symposium on research in computer security. Springer, 271–289

2024

[13] [13]

Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, and Yossi Adi. 2024. The larger the better? improved llm code-generation via budget reallocation. arXiv preprint arXiv:2404.00725(2024)

work page arXiv 2024

[14] [14]

Jingxuan He and Martin Vechev. 2023. Large language models for code: Secu- rity hardening and adversarial testing. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879

2023

[15] [15]

Jingxuan He, Mark Vero, Gabriela Krasnopolska, and Martin Vechev. 2024. In- struction tuning for secure code generation.arXiv preprint arXiv:2402.09497 (2024)

work page arXiv 2024

[16] [16]

Peiwei Hu, Ruigang Liang, and Kai Chen. 2024. Degpt: Optimizing decompiler output with llm. InProceedings 2024 Network and Distributed System Security Symposium, Vol. 267622140

2024

[17] [17]

JavaScript Obfuscator Contributors. [n. d.]. JavaScript Obfuscator. https: //github.com/javascript-obfuscator/javascript-obfuscator. Accessed: 2025-01-10

2025

[18] [18]

Shan Jiang, Pranoy Kovuri, David Tao, and Zhixun Tan. 2025. Cascade: Llm- powered javascript deobfuscator at google.arXiv preprint arXiv:2507.17691(2025)

work page arXiv 2025

[19] [19]

Sathvik Joel, Jie Wu, and Fatemeh Fard. 2024. A survey on llm-based code generation for low-resource and domain-specific programming languages.ACM Transactions on Software Engineering and Methodology(2024)

2024

[20] [20]

Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample

[21] [21]

Advances in Neural Information Processing Systems34 (2021), 14967–14979

DOBF: A deobfuscation pre-training objective for programming languages. Advances in Neural Information Processing Systems34 (2021), 14967–14979

2021

[22] [22]

Dong Li, Meng Yan, Yaosheng Zhang, Zhongxin Liu, Chao Liu, Xiaohong Zhang, Ting Chen, and David Lo. 2024. CoSec: On-the-Fly security hardening of code LLMs via supervised co-decoding. InProceedings of the 33rd ACM SIGSOFT Inter- national Symposium on Software Testing and Analysis. 1428–1439

2024

[23] [23]

Yalan Lin, Chengcheng Wan, Yixiong Fang, and Xiaodong Gu. 2024. CodeCipher: Learning to Obfuscate Source Code Against LLMs.arXiv preprint arXiv:2410.05797 (2024)

work page arXiv 2024

[24] [24]

Guilong Lu, Xiaolin Ju, Xiang Chen, Wenlong Pei, and Zhilong Cai. 2024. GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning.Journal of Systems and Software212 (2024), 112031

2024

[25] [25]

Meta Platforms, Inc. [n. d.]. LLaMA Responsible Use Guide. https:// www.llama.com/docs/how-to-guides/responsible-use-guide-resources. Ac- cessed: 2025-01-10

2025

[26] [26]

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an llm to help with code understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

2024

[27] [27]

Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, and NhatHai Phan. 2024. Promsec: Prompt optimization for secure generation of functional source code with large language models (llms). InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2266–2280

2024

[28] [28]

Yu Nong, Richard Fang, Guangbei Yi, Kunsong Zhao, Xiapu Luo, Feng Chen, and Haipeng Cai. 2024. Vgx: Large-scale sample generation for boosting learning- based software vulnerability analyses. InProceedings of the IEEE/ACM 46th Inter- national Conference on Software Engineering. 1–13

2024

[29] [29]

Obfuscator-LLVM Contributors. [n. d.]. Obfuscator-LLVM. https://github .com/ obfuscator-llvm/obfuscator. Accessed: 2025-01-10

2025

[30] [30]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

2022

[31] [31]

Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al

[32] [32]

Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks.arXiv preprint arXiv:2105.12655(2021)

work page arXiv 2021

[33] [33]

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundare- san, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[34] [34]

CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A Choquette-Choo, Jingyue Shen, Joe Kelley, et al. 2024. Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409(2024)

work page arXiv 2024

[35] [35]

Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, et al. 2024. Debugbench: Evaluating debugging capability of large language models.arXiv preprint arXiv:2401.04621 (2024)

work page arXiv 2024

[36] [36]

Anton Tkachenko, Dmitrij Suskevic, and Benjamin Adolphi. 2025. Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities.arXiv preprint arXiv:2505.19887 (2025)

work page arXiv 2025

[37] [37]

Tree-sitter Contributors. [n. d.]. Tree-sitter. https://tree-sitter .github.io/tree- sitter/. Accessed: 2025-01-10

2025

[38] [38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017

[39] [39]

Nalin Wadhwa, Jui Pradhan, Atharv Sonwane, Surya Prakash Sahu, Nagarajan Natarajan, Aditya Kanade, Suresh Parthasarathy, and Sriram Rajamani. 2024. Core: Resolving code quality issues using llms.Proceedings of the ACM on Software Engineering1, FSE (2024), 789–811

2024

[40] [40]

Alperen Yildiz, Sin G Teo, Yiling Lou, Yebo Feng, Chong Wang, and Dinil M Divakaran. 2025. Benchmarking LLMs and LLM-based Agents in Practical Vul- nerability Detection for Code Repositories.arXiv preprint arXiv:2503.03586(2025)

work page arXiv 2025

[41] [41]

JD Zamfirescu-Pereira, Eunice Jun, Michael Terry, Qian Yang, and Björn Hart- mann. 2025. Beyond code generation: Llm-supported exploration of the program design space. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–17

2025

[42] [42]

Boyu Zhang, Tianyu Du, Junkai Tong, Xuhong Zhang, Kingsum Chow, Sheng Cheng, Xun Wang, and Jianwei Yin. 2024. SecCoder: Towards Generalizable and Robust Secure Code Generation.arXiv preprint arXiv:2410.01488(2024)

work page arXiv 2024

[43] [43]

Huangzhao Zhang, Kechi Zhang, Zhuo Li, Jia Li, Jia Li, Yongmin Li, Yunfei Zhao, Yuqi Zhu, Fang Liu, Ge Li, et al. 2024. Deep learning for code generation: a survey. Science China Information Sciences67, 9 (2024), 191101

2024

[44] [44]

Yichi Zhang. 2024. Detecting code comment inconsistencies using llm and pro- gram analysis. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 683–685

2024