Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications

John D. Hastings; Varghese Vaidyan; Walther A. Del Orbe

arxiv: 2606.10945 · v1 · pith:IWDQTK7Znew · submitted 2026-06-09 · 💻 cs.CR · cs.SE

Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications

Walther A. Del Orbe , John D. Hastings , Varghese Vaidyan This is my paper

Pith reviewed 2026-06-27 12:40 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords adversarial attacksAI code generationsoftware vulnerabilitieslarge language modelssecurity defensescode generatorscontext manipulation

0 comments

The pith

Context-based adversarial attacks increase vulnerability generation in AI code generators by a factor of 10.7.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that carefully chosen inputs such as comments, documentation strings, and variable names can steer large language models used for code generation into producing code that contains security vulnerabilities. Experiments across CodeT5+, CodeLlama, GPT-3.5-Turbo, and GPT-4 establish that these attacks raise the rate of vulnerable outputs from a baseline of 3.5 percent to 37.4 percent on average, with some direct-instruction variants reaching 100 percent success on specific models. The same attacks transfer between models at rates of 60 to 100 percent, suggesting the problem is not limited to any single architecture. A dual-layer defense is presented that flags most such inputs while keeping false positives near zero and adding only 520 milliseconds of latency. If these results hold, then organizations relying on AI code assistants would need to treat generated code as untrusted by default until additional safeguards are applied.

Core claim

Adversarial conditions increase vulnerability generation 10.7x (from 3.5% to 37.4%), with direct instruction attacks achieving 100% success on GPT-3.5-Turbo. Cross-model transferability reaches 60-100%, indicating systemic architectural vulnerabilities rather than model-specific flaws. The dual-layer defense framework achieves 89.1% detection rate with 0.3% false positives and 520ms latency.

What carries the argument

Context-based adversarial attacks, in which strategically crafted contextual inputs including comments, documentation, and variable names bias large language models toward generating exploitable code.

If this is right

Vulnerability generation rises sharply once adversarial context is introduced.
The same crafted inputs succeed across four distinct model families.
Direct-instruction variants can force 100 percent vulnerable output on at least one widely used model.
A dual-layer detection system can identify most attacks with low overhead.
The observed transferability points to shared architectural exposure rather than isolated model weaknesses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams integrating AI code generators into security-sensitive projects would need to add manual review or automated scanning steps that were previously unnecessary.
Model providers may need to harden training or inference pipelines against context manipulation to preserve trust in generated code.
The same attack surface could appear in other generative tasks such as natural-language summarization or data transformation if similar contextual biases exist.

Load-bearing premise

The 2800 controlled experiments and chosen contexts accurately reflect real-world usage patterns and attack surfaces that developers encounter when using these AI code generators.

What would settle it

A measurement of actual vulnerability rates in unscripted developer sessions with AI code tools that shows rates remaining near the benign 3.5 percent baseline even when contextual inputs vary.

read the original abstract

AI-powered code generation systems have transformed software development but introduce critical inference-time security vulnerabilities. This research presents a systematic investigation of context-based adversarial attacks, where strategically crafted contextual inputs, including comments, documentation, variable names, bias large language models toward generating exploitable code. Through 2,800 controlled experiments across CodeT5+, CodeLlama, GPT-3.5-Turbo, and GPT-4, we quantify attack effectiveness and defense mechanisms. Results demonstrate that adversarial conditions increase vulnerability generation 10.7x (from 3.5% to 37.4%), with direct instruction attacks achieving 100% success on GPT-3.5-Turbo. Cross-model transferability reaches 60-100%, indicating systemic architectural vulnerabilities rather than model-specific flaws. Our dual-layer defense framework achieves 89.1% detection rate with 0.3% false positives and 520ms latency, demonstrating practical feasibility for real-time deployment in development environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports a 10.7x rise in vulnerable code from adversarial contexts across four models plus a defense, but the lack of detail on context selection and vulnerability labeling makes the numbers hard to evaluate.

read the letter

The key takeaway from this paper is the reported 10.7 times increase in vulnerability generation under adversarial conditions, from 3.5% to 37.4%, based on 2800 experiments across CodeT5+, CodeLlama, GPT-3.5-Turbo, and GPT-4. Direct instruction attacks reached 100% success on GPT-3.5-Turbo, and transferability was high at 60-100%. They also present a dual-layer defense with 89.1% detection and low false positives.

What the paper does well is covering multiple models, both open and closed, and providing quantitative results on both the attacks and a proposed defense. This gives a broad view of the vulnerability across different systems and includes some practical metrics on the defense side.

The soft spots are in the experimental details. There's no information on how the contextual inputs were constructed or sampled, what criteria defined a vulnerability, or any statistical analysis. This makes it difficult to assess if the findings generalize beyond their specific test cases. The assumption that these contexts reflect real-world usage is a potential weak point, as the stress-test note points out, and it holds up from the abstract alone.

Overall, the work engages with an important practical issue in AI-assisted coding. A reader focused on AI security or software engineering tools would find the numbers informative, though they'd want to see more on the methodology.

This paper deserves a serious referee because it raises a timely concern with measurable claims that can be verified or challenged in review. I would recommend sending it for peer review rather than desk rejecting it.

Referee Report

2 major / 1 minor

Summary. The manuscript investigates context-based adversarial attacks on AI code generators (CodeT5+, CodeLlama, GPT-3.5-Turbo, GPT-4) via 2,800 controlled experiments. It claims that adversarial contexts (comments, documentation, variable names) increase vulnerability generation 10.7× (3.5% to 37.4%), direct-instruction attacks reach 100% success on GPT-3.5-Turbo, cross-model transferability is 60–100%, and a dual-layer defense achieves 89.1% detection at 0.3% false-positive rate and 520 ms latency.

Significance. If the experimental design, context sampling, and vulnerability labeling are sound and representative, the work would establish that current code-generation models share systemic inference-time vulnerabilities and would supply a practical, low-overhead defense suitable for IDE integration; this would be a substantive contribution to the security of AI-assisted software development.

major comments (2)

[Abstract] Abstract: the headline 10.7× claim and the 100% / 60–100% transferability figures are presented as direct experimental outcomes, yet the abstract (and, from the supplied description, the methods) supplies no information on prompt-construction protocol, statistical testing, baseline selection, error analysis, or inter-rater reliability for vulnerability labeling; without these details the quantitative multiplier cannot be evaluated.
[Methods / Experimental Design] Experimental setup (implicit in the 2,800-experiment description): the paper does not report how the adversarial contexts were sampled or validated against real-world repositories or IDE usage traces, nor the precise static/dynamic criteria used to label generated code as vulnerable; this directly affects the generalizability of the reported multiplier and transferability results.

minor comments (1)

[Abstract] The abstract states “520ms latency” without specifying the hardware or batch size used for the measurement; a single sentence clarifying the evaluation environment would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have made revisions to improve the clarity and completeness of the experimental reporting.

read point-by-point responses

Referee: [Abstract] Abstract: the headline 10.7× claim and the 100% / 60–100% transferability figures are presented as direct experimental outcomes, yet the abstract (and, from the supplied description, the methods) supplies no information on prompt-construction protocol, statistical testing, baseline selection, error analysis, or inter-rater reliability for vulnerability labeling; without these details the quantitative multiplier cannot be evaluated.

Authors: We agree that the abstract would benefit from additional methodological context to support the quantitative claims. In the revised manuscript we have expanded the abstract to briefly summarize the prompt-construction protocol, statistical testing approach, baseline selection, error analysis, and inter-rater reliability for vulnerability labeling. We have also added corresponding detail in the Methods section. revision: yes
Referee: [Methods / Experimental Design] Experimental setup (implicit in the 2,800-experiment description): the paper does not report how the adversarial contexts were sampled or validated against real-world repositories or IDE usage traces, nor the precise static/dynamic criteria used to label generated code as vulnerable; this directly affects the generalizability of the reported multiplier and transferability results.

Authors: We acknowledge that the original submission lacked sufficient detail on context sampling and labeling criteria. The revised manuscript now includes an expanded Methods section describing how adversarial contexts were sampled (template-based mutations informed by common patterns), validation steps against real-world repository and IDE traces, and the precise static (CodeQL rules) and dynamic criteria used for labeling vulnerabilities, along with inter-rater reliability metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements with no derivations or self-referential reductions

full rationale

The paper reports direct experimental outcomes from 2,800 controlled trials measuring vulnerability generation rates under adversarial contexts. No equations, fitted parameters, predictions derived from prior fits, or load-bearing self-citations appear in the provided text. The 10.7x multiplier and transferability percentages are presented as observed frequencies, not as outputs of any model or theorem that reduces to the inputs by construction. The derivation chain is therefore self-contained as raw measurement.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work described is an empirical study of attacks and defenses; the abstract contains no mathematical derivations, fitted parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.1-grok · 5704 in / 1130 out tokens · 34408 ms · 2026-06-27T12:40:18.070966+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 11 canonical work pages · 2 internal anchors

[1]

S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer,The impact of AI on developer productivity: Evidence from GitHub Copilot, 2023.DOI: 10.48550/arXiv.2302.06590 arXiv: 2302.06590

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.06590 2023
[2]

here’s how many failed security tests, 2025

Veracode,GenAI code security report: We asked 100+ AI models to write code. here’s how many failed security tests, 2025. Accessed: May 31, 2026. [Online]. Available: https://www.veracode.com/blog/ genai-code-security-report/

2025
[3]

Cybersecurity risks of AI-generated code,

J. Ji, J. Jun, M. Wu, and R. Gelles, “Cybersecurity risks of AI-generated code,” Center for Security and Emerging Technology, Tech. Rep., 2024. Accessed: May 31, 2025. [Online]. Available: https://cset.georgetown. edu/publication/cybersecurity-risks-of-ai-generated-code/

2024
[4]

Accessed: May 31, 2025

OW ASP Foundation,OWASP top 10 for LLM applications 2025: LLM01 prompt injection, 2025. Accessed: May 31, 2025. [Online]. Available: https://genai.owasp.org/llmrisk/llm01-prompt-injection/

2025
[5]

Security weaknesses of copilot-generated code in GitHub projects: An empirical study,

Y . Fu et al., “Security weaknesses of copilot-generated code in GitHub projects: An empirical study,”ACM Transactions on Software Engineer- ing and Methodology, 2025.DOI: 10.1145/3716848

work page doi:10.1145/3716848 2025
[6]

CodeT5+: Open code large language models for code understanding and generation,

Y . Wang, H. Le, A. D. Gotmare, N. D. Q. Bui, J. Li, and S. C. H. Hoi, “CodeT5+: Open code large language models for code understanding and generation,” in2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2023, pp. 1069–1088.DOI: 10.18653/v1/2023.emnlp-main.68

work page doi:10.18653/v1/2023.emnlp-main.68 2023
[7]

Code Llama: Open Foundation Models for Code

B. Rozi `ere et al.,Code Llama: Open foundation models for code, 2023. DOI: 10.48550/arXiv.2308.12950 arXiv: 2308.12950

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.12950 2023
[8]

Gutfleisch, J

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, pp. 754–768.DOI: 10.1109/SP46214.2022.9833571

work page doi:10.1109/sp46214.2022.9833571 2022
[9]

OW ASP Foundation,OWASP top ten web application security risks,
[10]

[Online]

Accessed: May 31, 2025. [Online]. Available: https://owasp.org/ Top10/2025/

2025
[11]

10 Evaluating Bivariate Causal Statements Based on Mutual Compatibility Richardson, T

B. Efron, “Bootstrap methods: Another look at the jackknife,”The Annals of Statistics, vol. 7, no. 1, pp. 1–26, 1979.DOI: 10.1214/aos/ 1176344552

work page doi:10.1214/aos/ 1979
[12]

Vulnerabilities in AI code generators: Exploring targeted data poisoning attacks,

D. Cotroneo, C. Improta, P. Liguori, and R. Natella, “Vulnerabilities in AI code generators: Exploring targeted data poisoning attacks,” in32nd IEEE/ACM Int. Conf. on Program Comprehension (ICPC ’24), ACM, 2024, pp. 280–292.DOI: 10.1145/3643916.3644416

work page doi:10.1145/3643916.3644416 2024
[13]

Not what you’ve signed up for: Compromising real- world LLM-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” in16th ACM Workshop on Artificial Intelligence and Security (AIsec ’23), ACM, 2023, pp. 79–90.DOI: 10.1145/3605764.3623985

work page doi:10.1145/3605764.3623985 2023
[14]

Formalizing and benchmarking prompt injection attacks and defenses,

Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847

2024
[15]

I. J. Goodfellow, J. Shlens, and C. Szegedy,Explaining and harnessing adversarial examples, ICLR 2015, 2015. arXiv: 1412.6572

Pith/arXiv arXiv 2015
[16]

In: IEEE Symposium on Security and Privacy

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in2017 IEEE Symposium on Security and Privacy (SP), IEEE, 2017, pp. 39–57.DOI: 10.1109/SP.2017.49

work page doi:10.1109/sp.2017.49 2017
[17]

Adversarial Examples for Evaluating Reading Comprehension Systems

R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” in2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Lin- guistics, 2017, pp. 2021–2031.DOI: 10.18653/v1/D17-1215

work page doi:10.18653/v1/d17-1215 2017
[18]

Thinking like a developer? Comparing the attention of humans with neural models of code,

M. Paltenghi and M. Pradel, “Thinking like a developer? Comparing the attention of humans with neural models of code,” in2021 36th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE), IEEE, 2021, pp. 867–879.DOI: 10.1109/ASE51524.2021.9678712

work page doi:10.1109/ase51524.2021.9678712 2021
[19]

You autocomplete me: Poisoning vulnerabilities in neural code completion,

R. Schuster, C. Song, E. Tromer, and V . Shmatikov, “You autocomplete me: Poisoning vulnerabilities in neural code completion,” in30th USENIX Security Symposium, 2021, pp. 1559–1575

2021

[1] [1]

S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer,The impact of AI on developer productivity: Evidence from GitHub Copilot, 2023.DOI: 10.48550/arXiv.2302.06590 arXiv: 2302.06590

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.06590 2023

[2] [2]

here’s how many failed security tests, 2025

Veracode,GenAI code security report: We asked 100+ AI models to write code. here’s how many failed security tests, 2025. Accessed: May 31, 2026. [Online]. Available: https://www.veracode.com/blog/ genai-code-security-report/

2025

[3] [3]

Cybersecurity risks of AI-generated code,

J. Ji, J. Jun, M. Wu, and R. Gelles, “Cybersecurity risks of AI-generated code,” Center for Security and Emerging Technology, Tech. Rep., 2024. Accessed: May 31, 2025. [Online]. Available: https://cset.georgetown. edu/publication/cybersecurity-risks-of-ai-generated-code/

2024

[4] [4]

Accessed: May 31, 2025

OW ASP Foundation,OWASP top 10 for LLM applications 2025: LLM01 prompt injection, 2025. Accessed: May 31, 2025. [Online]. Available: https://genai.owasp.org/llmrisk/llm01-prompt-injection/

2025

[5] [5]

Security weaknesses of copilot-generated code in GitHub projects: An empirical study,

Y . Fu et al., “Security weaknesses of copilot-generated code in GitHub projects: An empirical study,”ACM Transactions on Software Engineer- ing and Methodology, 2025.DOI: 10.1145/3716848

work page doi:10.1145/3716848 2025

[6] [6]

CodeT5+: Open code large language models for code understanding and generation,

Y . Wang, H. Le, A. D. Gotmare, N. D. Q. Bui, J. Li, and S. C. H. Hoi, “CodeT5+: Open code large language models for code understanding and generation,” in2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2023, pp. 1069–1088.DOI: 10.18653/v1/2023.emnlp-main.68

work page doi:10.18653/v1/2023.emnlp-main.68 2023

[7] [7]

Code Llama: Open Foundation Models for Code

B. Rozi `ere et al.,Code Llama: Open foundation models for code, 2023. DOI: 10.48550/arXiv.2308.12950 arXiv: 2308.12950

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.12950 2023

[8] [8]

Gutfleisch, J

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, pp. 754–768.DOI: 10.1109/SP46214.2022.9833571

work page doi:10.1109/sp46214.2022.9833571 2022

[9] [9]

OW ASP Foundation,OWASP top ten web application security risks,

[10] [10]

[Online]

Accessed: May 31, 2025. [Online]. Available: https://owasp.org/ Top10/2025/

2025

[11] [11]

10 Evaluating Bivariate Causal Statements Based on Mutual Compatibility Richardson, T

B. Efron, “Bootstrap methods: Another look at the jackknife,”The Annals of Statistics, vol. 7, no. 1, pp. 1–26, 1979.DOI: 10.1214/aos/ 1176344552

work page doi:10.1214/aos/ 1979

[12] [12]

Vulnerabilities in AI code generators: Exploring targeted data poisoning attacks,

D. Cotroneo, C. Improta, P. Liguori, and R. Natella, “Vulnerabilities in AI code generators: Exploring targeted data poisoning attacks,” in32nd IEEE/ACM Int. Conf. on Program Comprehension (ICPC ’24), ACM, 2024, pp. 280–292.DOI: 10.1145/3643916.3644416

work page doi:10.1145/3643916.3644416 2024

[13] [13]

Not what you’ve signed up for: Compromising real- world LLM-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” in16th ACM Workshop on Artificial Intelligence and Security (AIsec ’23), ACM, 2023, pp. 79–90.DOI: 10.1145/3605764.3623985

work page doi:10.1145/3605764.3623985 2023

[14] [14]

Formalizing and benchmarking prompt injection attacks and defenses,

Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847

2024

[15] [15]

I. J. Goodfellow, J. Shlens, and C. Szegedy,Explaining and harnessing adversarial examples, ICLR 2015, 2015. arXiv: 1412.6572

Pith/arXiv arXiv 2015

[16] [16]

In: IEEE Symposium on Security and Privacy

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in2017 IEEE Symposium on Security and Privacy (SP), IEEE, 2017, pp. 39–57.DOI: 10.1109/SP.2017.49

work page doi:10.1109/sp.2017.49 2017

[17] [17]

Adversarial Examples for Evaluating Reading Comprehension Systems

R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” in2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Lin- guistics, 2017, pp. 2021–2031.DOI: 10.18653/v1/D17-1215

work page doi:10.18653/v1/d17-1215 2017

[18] [18]

Thinking like a developer? Comparing the attention of humans with neural models of code,

M. Paltenghi and M. Pradel, “Thinking like a developer? Comparing the attention of humans with neural models of code,” in2021 36th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE), IEEE, 2021, pp. 867–879.DOI: 10.1109/ASE51524.2021.9678712

work page doi:10.1109/ase51524.2021.9678712 2021

[19] [19]

You autocomplete me: Poisoning vulnerabilities in neural code completion,

R. Schuster, C. Song, E. Tromer, and V . Shmatikov, “You autocomplete me: Poisoning vulnerabilities in neural code completion,” in30th USENIX Security Symposium, 2021, pp. 1559–1575

2021