Understanding the (In)Security of Vibe-Coded Applications

Junquan Deng; Ruijie Meng; Zhiyu Fan

arxiv: 2606.23130 · v2 · pith:TUCTNHKAnew · submitted 2026-06-22 · 💻 cs.CR · cs.SE

Understanding the (In)Security of Vibe-Coded Applications

Junquan Deng , Zhiyu Fan , Ruijie Meng This is my paper

Pith reviewed 2026-06-26 08:04 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords vibe codingAI agentssoftware securityvulnerabilitiesLLMapplication developmentsecurity risksnatural language programming

0 comments

The pith

Applications created mainly through natural-language chats with AI agents contain recurring vulnerabilities such as placeholder logic, unfiltered inputs, and exposed secrets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vibe coding delegates large parts of software creation to AI agents via everyday language prompts rather than direct code writing. The paper collects real apps built this way and audits them to map how often and why security problems appear. It traces the flaws to AI agents losing track of earlier decisions, optimizing only for the immediate request, and lacking built-in security knowledge. Even stronger models and better prompts cut down on some issues but leave the core risks in place. This points to new security challenges when development responsibility shifts heavily to AI systems.

Core claim

Vibe-coded applications exhibit recurring vulnerability patterns that differ from those in conventional software development, including placeholder logic, unfiltered input, and secret exposure; these arise from systematic limitations of AI agents throughout the lifecycle such as memory loss, locally optimized objectives, and insufficient security knowledge; advances in LLM capabilities and improved prompting reduce incidence but do not eliminate the risks.

What carries the argument

A vulnerability analysis framework that combines agent-assisted code auditing with human validation, applied to a corpus of real-world applications built with popular AI agents.

If this is right

Vibe-coded apps display vulnerability patterns unlike those in traditional workflows.
AI agent limitations in memory, objective focus, and security knowledge directly produce these patterns.
Stronger LLMs and refined prompts lower but do not remove the security exposure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams adopting vibe coding for production systems may need separate manual security gates that the AI process itself does not supply.
The same AI limitations could affect other delegated tasks such as testing or documentation generation.
Agent designs that treat security knowledge as a persistent global objective rather than a local one might shrink the gap.

Load-bearing premise

The chosen auditing framework accurately captures the true prevalence and root causes of vulnerabilities without missing real issues or adding false ones, and the collected apps represent typical practice.

What would settle it

Finding a sizable collection of vibe-coded applications that lack placeholder logic, unfiltered inputs, and secret exposures would challenge the claim of recurring distinct patterns.

Figures

Figures reproduced from arXiv: 2606.23130 by Junquan Deng, Ruijie Meng, Zhiyu Fan.

**Figure 1.** Figure 1: Application categories, and the most-used development languages and technology stacks in vibe-coded applications. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Prevalence of vulnerabilities. Pn means n% of the observed values fall at or below this threshold. Vulnerability Prevalence [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Eight failure modes (⃝1 –⃝8 ) for vibe-coded vulnerabilities, grouped by three defects (i.e., memory, object, and knowledge) and mapped to the vibe-coding lifecycle (from specification and implementation, to iteration and deployment). coded applications exhibit substantially higher vulnerability rates than the OWASP baseline, with differences up to 20×. The ranking of the most prevalent categories also div… view at source ↗

**Figure 4.** Figure 4: Number of OWASP Top 10 (2025) vulnerabilities [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Forgotten obligations in BoxCostPro. 6.2. Objective Defects Objective defects occur when immediate functional goals are prioritized over long-term correctness or security during development. In these cases, security is systematically subordinated to short-term objectives such as making the application demo-ready, unblocking a visible error, or restoring functionality as quickly as possible. Objective def… view at source ↗

**Figure 5.** Figure 5: illustrates the forward variant in FrameOps [37]. A validateApiKey middleware is added for a newly introduced router, but the same protection is not applied to 11 pre-existing handlers. As a result, those handlers remain reachable without the intended authorization check. We identify 185 vulnerabilities (12.6%) caused by this failure mode. Their severity is also substantial. 77.3% are rated Critical or Hig… view at source ↗

**Figure 7.** Figure 7: Demo-oriented design in fuyou. Iteration stage: ⃝4 function-fix side effects. This failure mode arises during debugging or feature iteration, when a visible error is resolved by weakening or bypassing a security control. For example, an authentication error, failed 9 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 9.** Figure 9: Hidden security rules in Vecto-Pilot. Implementation Stage: ⃝6 hallucination. This failure mode captures cases where the generated code relies on fabricated technical assumptions in security-sensitive logic. The assumption may concern a library API, a database schema, or the security property of a primitive. Many hallucinationinduced mistakes break functionality and are therefore corrected during develo… view at source ↗

**Figure 8.** Figure 8: Function-fix side effects in IPC. 6.3. Knowledge Defects Knowledge defects refers that both agent and user may fail to recognize the security requirements of the application during its development. In the vibe-coding workflow, many security constraints are not explicitly stated, and agents may not derive them from the surrounding application context. On the user side, developers may overlook necessary sec… view at source ↗

**Figure 10.** Figure 10 [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Insecure instructions in Dune. is left to the user. Examples include excluding generated secret files from Git, configuring necessary environment variables, or updating vulnerable dependencies. These steps may be routine for experienced developers, but the vibecoding workflow does not necessarily make the obligation explicit or enforced. This creates a responsibility gap between what the agent assumes w… view at source ↗

**Figure 12.** Figure 12: User-dependent security in my-board-app. RQ.3 Findings. Insecurities in vibe-coded applications are symptoms of systemic failures. Throughout the vibe-coding lifecycle, security requirements are often not reliably preserved, not consistently prioritized, or left implicit, which makes them difficult to fully accomplish. We further propose eight common failure modes with respect to memory, objective, and … view at source ↗

read the original abstract

Recent advances in large language models (LLMs) have enabled vibe coding, an emerging software development paradigm in which users create applications primarily through natural-language interactions with AI agents. Due to its low barrier to entry, vibe coding is rapidly gaining adoption in practice. Unlike conventional AI-assisted programming, where developers remain responsible for implementation and code review, vibe coding delegates a substantial portion of development to AI systems. This shift raises a fundamental question: how (in)secure are applications developed through vibe coding? In this paper, we conduct a systematic study of the security of vibe-coded applications. We collect a large corpus of real-world applications developed using popular AI agents and design a vulnerability analysis framework that combines agent-assisted code auditing with human validation. Using this framework, we examine the prevalence, severity, and root causes of vulnerabilities in the deployed vibe-coded applications. Our study reveals several key findings: (1) vibe-coded applications exhibit recurring vulnerability patterns that differ from those commonly observed in conventional software development workflows, including placeholder logic, unfiltered input, and secret exposure; (2) these vulnerabilities arise from systematic limitations of AI agents throughout the vibe-coding lifecycle, such as memory loss, locally optimized objectives and insufficient security knowledge; and (3) while advances in LLM capabilities and improved prompting strategies can reduce the incidence of vulnerabilities, they do not eliminate the underlying security risks. Overall, our study provides an empirical understanding of the security landscape of vibe-coded applications and lays the groundwork for addressing the security challenges introduced by the growing delegation of software development to AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract flags real-sounding security patterns in vibe-coded apps but gives zero numbers or method details, so the claims can't be checked yet.

read the letter

The main point is that vibe-coded apps show distinct problems like placeholder logic, unfiltered input, and exposed secrets that trace back to AI agent limits such as memory loss and narrow objectives. The paper sets this apart from regular AI-assisted coding by studying the full delegation model.

It does a reasonable job naming those patterns and linking them to the lifecycle stages where the AI handles most of the work. The idea of using agent-assisted auditing plus human review is a straightforward way to look at deployed apps, and the note that better models or prompts won't remove the risks entirely follows from the described causes.

The soft spot is obvious from the abstract alone: no corpus size, no vulnerability counts, no severity stats, and no validation numbers. Without those, it's impossible to tell how common the issues are or whether the framework produces reliable root-cause findings. The representativeness of the collected apps and the accuracy of the human validation step remain open questions.

This is for researchers tracking security in AI-driven development practices. A reader already working on LLM code generation or app security could pick up the patterns as starting points for their own checks.

It deserves a serious referee because the topic is new and the empirical framing makes sense, but the authors would need to supply the actual data, corpus details, and quantitative results before any stronger conclusions could be drawn.

Referee Report

3 major / 2 minor

Summary. The manuscript reports on a systematic empirical study of the security of vibe-coded applications, defined as software developed primarily through natural-language interactions with AI agents rather than conventional coding workflows. The authors collect a corpus of real-world applications built with popular AI agents, introduce a vulnerability analysis framework that integrates agent-assisted code auditing with human validation, and examine the prevalence, severity, and root causes of vulnerabilities in deployed applications. The central claims are that vibe-coded apps exhibit recurring vulnerability patterns distinct from conventional development (e.g., placeholder logic, unfiltered input, secret exposure); that these stem from systematic AI-agent limitations across the development lifecycle (memory loss, locally optimized objectives, insufficient security knowledge); and that while advances in LLM capabilities and prompting strategies can reduce incidence, they do not eliminate the underlying risks.

Significance. If the empirical results and analysis framework prove robust, the work would be significant for the security community because it addresses an emerging, low-barrier development paradigm whose adoption is increasing rapidly. It supplies the first large-scale observational data on how delegation of implementation to AI agents introduces new classes of vulnerabilities and identifies actionable root causes tied to current LLM limitations. The mixed agent-plus-human auditing approach is a methodological strength that could be adopted in future studies of AI-generated code.

major comments (3)

[Abstract] Abstract: the study design and high-level findings are described, but the abstract supplies no quantitative results (corpus size, vulnerability counts, severity distributions, statistical measures, or inter-rater agreement for human validation). Without these data it is impossible to determine whether the observed patterns support the three numbered claims.
[Methodology / Vulnerability analysis framework] Vulnerability analysis framework section: the claim that the combined agent-assisted auditing plus human validation accurately identifies true prevalence and root causes rests on the untested assumption that false-positive and false-negative rates are low; the manuscript must report concrete validation metrics (e.g., precision/recall on a labeled subset or disagreement rates between agent and human auditors) because this is load-bearing for all prevalence and causality statements.
[Corpus collection] Corpus description: the representativeness of the collected applications for typical vibe-coded practice is asserted but not demonstrated; the paper should report inclusion criteria, total number of apps examined, distribution across domains and AI agents used, and any filtering steps, as these directly affect the generalizability of the recurring-pattern findings.

minor comments (2)

[Abstract] Abstract: the phrase 'large corpus' is used without a number; adding even a single sentence with approximate scale would improve readability.
[Introduction] Terminology: 'vibe coding' and 'vibe-coded' are introduced without a crisp definition or citation to prior usage; a short definitional paragraph early in the introduction would help readers unfamiliar with the term.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's constructive feedback and recommendation for major revision. We address each major comment point-by-point below, agreeing where the manuscript can be strengthened through revision.

read point-by-point responses

Referee: [Abstract] Abstract: the study design and high-level findings are described, but the abstract supplies no quantitative results (corpus size, vulnerability counts, severity distributions, statistical measures, or inter-rater agreement for human validation). Without these data it is impossible to determine whether the observed patterns support the three numbered claims.

Authors: We agree that the abstract would benefit from quantitative results to allow readers to assess the claims directly. In the revised version we will incorporate key statistics including corpus size, vulnerability counts and severity distributions, and inter-rater agreement for the human validation step. revision: yes
Referee: [Methodology / Vulnerability analysis framework] Vulnerability analysis framework section: the claim that the combined agent-assisted auditing plus human validation accurately identifies true prevalence and root causes rests on the untested assumption that false-positive and false-negative rates are low; the manuscript must report concrete validation metrics (e.g., precision/recall on a labeled subset or disagreement rates between agent and human auditors) because this is load-bearing for all prevalence and causality statements.

Authors: This point is well taken; reporting concrete validation metrics is necessary to support the framework's reliability. We will add a dedicated validation subsection that reports precision/recall on a labeled subset and disagreement rates between agent and human auditors. revision: yes
Referee: [Corpus collection] Corpus description: the representativeness of the collected applications for typical vibe-coded practice is asserted but not demonstrated; the paper should report inclusion criteria, total number of apps examined, distribution across domains and AI agents used, and any filtering steps, as these directly affect the generalizability of the recurring-pattern findings.

Authors: We agree that explicit details on corpus construction are required for evaluating generalizability. The revised manuscript will expand the corpus collection section to include inclusion criteria, the total number of applications examined, domain and agent distributions, and all filtering steps. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a purely empirical observational study that collects a corpus of real-world vibe-coded applications and applies an agent-assisted auditing framework with human validation to identify vulnerability patterns. No equations, derivations, fitted parameters, or self-citation chains appear in the abstract or described methodology; all claims rest on direct analysis of external applications rather than reducing to prior results by construction. The study is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two domain assumptions: that the collected corpus represents typical vibe-coded applications and that the custom analysis framework reliably surfaces the relevant vulnerabilities and their causes.

axioms (2)

domain assumption The corpus of real-world applications developed using popular AI agents is representative of vibe-coded applications in practice.
Generalization from the studied apps to the broader phenomenon depends on this assumption.
domain assumption The vulnerability analysis framework combining agent-assisted code auditing with human validation correctly identifies prevalence, severity, and root causes.
All three key findings depend on the framework producing accurate and complete results.

pith-pipeline@v0.9.1-grok · 5809 in / 1521 out tokens · 28575 ms · 2026-06-26T08:04:19.737824+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 7 canonical work pages

[1]

Towards ai-native software engineering (se 3.0): A vision and a challenge roadmap,

A. E. Hassan, G. A. Oliva, D. Lin, B. Chen, and Z. M. J. Jiang, “Towards ai-native software engineering (se 3.0): A vision and a challenge roadmap,”ACM Trans. Softw. Eng. Methodol., Apr. 2026, just Accepted. [Online]. Available: https://doi.org/10.1145/3807901

work page doi:10.1145/3807901 2026
[2]

GitHub Copilot—Your AI pair programmer,

Microsoft, “GitHub Copilot—Your AI pair programmer,” accessed: 2026-05-20. [Online]. Available: https://github.com/features/copilot

2026
[3]

Codex CLI,

OpenAI, “Codex CLI,” accessed: 2026-05-20. [Online]. Available: https://github.com/openai/codex

2026
[4]

Lovable,

Lovable, “Lovable,” accessed: 2026-05-20. [Online]. Available: https://lovable.dev/

2026
[5]

Claude Code,

Anthropic, “Claude Code,” accessed: 2026-05-20. [Online]. Available: https://claude.com/product/claude-code

2026
[6]

There’s a new kind of coding I call “vibe coding

A. Karpathy, “There’s a new kind of coding I call “vibe coding”,” Post on X (formerly Twitter), Feb. 2025, accessed: 2026-05-20. [Online]. Available: https://x.com/karpathy/status/1886192184808149383

arXiv 2025
[7]

Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,

S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,”arXiv preprint arXiv:2512.03262, 2025. [Online]. Available: https://arxiv.org/abs/2512.03262

arXiv 2025
[8]

Lovable AI Statistics 2026—Users, Revenue, Adoption & Market Metrics,

Panto, “Lovable AI Statistics 2026—Users, Revenue, Adoption & Market Metrics,” accessed: 2026-05-20. [Online]. Available: https://www.getpanto.ai/blog/lovable-statistics

2026
[9]

Lovable Security,

Lovable, “Lovable Security,” accessed: 2026-05-20. [Online]. Available: https://docs.lovable.dev/features/security

2026
[10]

Claude Code Security,

Anthropic, “Claude Code Security,” accessed: 2026-05-20. [Online]. Available: https://code.claude.com/docs/en/security

2026
[11]

Passing the security vibe check: The dangers of vibe coding,

Databricks, “Passing the security vibe check: The dangers of vibe coding,” 2025, accessed: 2026- 04-29. [Online]. Available: https://www.databricks.com/blog/ passing-security-vibe-check-dangers-vibe-coding

2025
[12]

OW ASP top 10 for LLM applications 2025,

OW ASP Foundation, “OW ASP top 10 for LLM applications 2025,” 2025, accessed: 2026-04-29. [Online]. Available: https: //genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/

2025
[13]

Vibe graveyard: Real-world security incidents in vibe-coded applications,

Vibe Graveyard, “Vibe graveyard: Real-world security incidents in vibe-coded applications,” 2025, accessed: 2026-04-29. [Online]. Available: https://www.vibegraveyard.ai/

2025
[14]

Membership inference attacks from first principles

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 754–768. [Online]. Available: https://doi.org/10.1109/SP46214.2022.9833571

work page doi:10.1109/sp46214.2022.9833571 2022
[15]

A.S.E: A repository-level benchmark for evaluating security in AI-generated code,

K. Lian, B. Wang, L. Zhang, L. Chen, J. Wang, Z. Zhao, Y . Yanget al., “A.S.E: A repository-level benchmark for evaluating security in AI-generated code,” 2025. [Online]. Available: https://arxiv.org/abs/2508.18106

arXiv 2025
[16]

Securevibebench: Evaluating secure coding capabilities of code agents with realistic vulnerability scenarios,

J. Chen, H. Huang, Y . Lyu, J. An, J. Shi, C. Yang, T. Zhang, H. Tian, Y . Li, Z. Li, X. Zhou, X. Hu, and D. Lo, “Securevibebench: Evaluating secure coding capabilities of code agents with realistic vulnerability scenarios,” 2026. [Online]. Available: https://arxiv.org/ abs/2509.22097

Pith/arXiv arXiv 2026
[17]

Zhang, H

Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “Autocoderover: Autonomous program improvement,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 1592–1604. [Online]. Available: https: //dl.acm.org/doi/10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024
[18]

Swe-agent: Agent-computer interfaces enable automated software engineering,

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “Swe-agent: Agent-computer interfaces enable automated software engineering,”Advances in Neural Information Processing Systems, vol. 37, pp. 50 528–50 652, 2024. [Online]. Available: https://arxiv.org/abs/2405.15793

Pith/arXiv arXiv 2024
[19]

Survey reveals AI’s impact on the developer experience,

I. Shani and GitHub Staff, “Survey reveals AI’s impact on the developer experience,” GitHub Blog (Research), 2023, accessed: 2026-05-20. [Online]. Available: https://github.blog/news-insights/ research/survey-reveals-ais-impact-on-the-developer-experience/

2023
[20]

AI in software engineering at Google: Progress and the path ahead,

S. Chandra and M. Tabachnyk, “AI in software engineering at Google: Progress and the path ahead,” Google Research Blog, 2024, accessed: 2026-05-20. [Online]. Available: https://research.google/blog/ ai-in-software-engineering-at-google-progress-and-the-path-ahead/

2024
[21]

Building software by rolling the dice: A qualitative study of vibe coding,

Y .-H. Chou, B. Jiang, Y . W. Chen, M. Weng, V . Jackson, T. Zimmermann, and J. A. Jones, “Building software by rolling the dice: A qualitative study of vibe coding,” inProceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2026, accepted; to appear. [Online]. Available: h...

arXiv 2026
[22]

Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai,

Y . Nie, Z. Wang, Y . Yang, R. Jiang, Y . Tang, X. Davies, Y . Gal, B. Li, W. Guo, and D. Song, “Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai,” Advances in Neural Information Processing Systems, vol. 38, 2026. [Online]. Available: https://arxiv.org/abs/2410.11096

arXiv 2026
[23]

How secure is code generated by chatgpt?

R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How secure is code generated by chatgpt?” in2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2023, pp. 2445–2451. [Online]. Available: https://arxiv.org/abs/2304.09655

Pith/arXiv arXiv 2023
[24]

How secure is AI-generated code: A large-scale comparison of large language models,

N. Tihanyi, T. Bisztray, M. A. Ferrag, R. Jain, and L. C. Cordeiro, “How secure is AI-generated code: A large-scale comparison of large language models,”Empirical Software Engineering, vol. 30, no. 2, p. 47, 2025. [Online]. Available: https://doi.org/10.1007/ s10664-024-10590-1

2025
[25]

ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

Y . Fu, P. Liang, A. Tahir, Z. Li, M. Shahin, J. Yu, and J. Chen, “Security weaknesses of copilot-generated code in github projects: An empirical study,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 8, Oct. 2025. [Online]. Available: https://doi.org/10.1145/3716848

work page doi:10.1145/3716848 2025
[26]

Security vulnerabilities in AI-generated code: A large-scale analysis of public GitHub repositories,

M. Schreiber and P. Tippe, “Security vulnerabilities in AI-generated code: A large-scale analysis of public GitHub repositories,” in Information and Communications Security, ser. Lecture Notes in Computer Science, vol. 16219. Singapore: Springer Nature Singapore, 2026, pp. 153–172. [Online]. Available: https://doi.org/ 10.1007/978-981-95-3537-8_9

work page doi:10.1007/978-981-95-3537-8_9 2026
[27]

LLM-CSEC: Empirical evaluation of security in C/C++ code generated by large language models,

M. U. Shahid, C. M. Ahmed, and R. Ranjan, “LLM-CSEC: Empirical evaluation of security in C/C++ code generated by large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2511.18966

arXiv 2025
[28]

The hidden risks of llm-generated web application code: A security-centric evaluation of code generation capabilities in large language models,

S. Dora, D. Lunkad, N. Aslam, S. Venkatesan, and S. K. Shukla, “The hidden risks of llm-generated web application code: A security-centric evaluation of code generation capabilities in large language models,” inInternational Conference on Information Systems Security, 2025, pp. 27–37. [Online]. Available: https://arxiv.org/abs/2504.20612

arXiv 2025
[29]

You still have to study on the security of LLM generated code,

A. Schaad, S. Götz, and D. Binder, “You still have to study on the security of llm generated code,” inICT Systems Security and Privacy Protection, L. Nemec Zlatolas, K. Rannenberg, T. Welzer, and J. Garcia-Alfaro, Eds., 2025, pp. 111–124. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-92886-4_8

work page doi:10.1007/978-3-031-92886-4_8 2025
[30]

Do users write more insecure code with ai assistants?

N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do users write more insecure code with AI assistants?” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. ACM, Nov. 2023, pp. 2785–2799. [Online]. Available: https://doi.org/10.1145/3576915.3623157 14

work page doi:10.1145/3576915.3623157 2023
[31]

BaxBench: Can LLMs generate correct and secure backends?

M. Vero, N. Mündler, V . Chibotaru, V . Raychev, M. Baader, N. Jovanovi ´c, J. He, and M. Vechev, “BaxBench: Can LLMs generate correct and secure backends?” inProceedings of the 42nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 267. PMLR, 13–19 Jul 2025, pp. 61 344–61 390. [Online]. Available: https://p...

2025
[32]

Security Best Practices Skill,

OpenAI, “Security Best Practices Skill,” accessed: 2026-05-20. [Online]. Available: https://github.com/openai/skills/tree/main/skills/ .curated/security-best-practices

2026
[33]

Antigravity Awesome Skills: Secu- rity Audit,

sickn33, “Antigravity Awesome Skills: Secu- rity Audit,” accessed: 2026-05-20. [Online]. Avail- able: https://github.com/sickn33/antigravity-awesome-skills/blob/ main/skills/security-audit/SKILL.md

2026
[34]

OW ASP risk rating methodology,

OW ASP Foundation, “OW ASP risk rating methodology,” accessed: 2026-06-12. [Online]. Available: https://owasp.org/www-community/ OW ASP_Risk_Rating_Methodology

2026
[35]

OW ASP top 10:2025—the ten most critical web application security risks,

OW ASP Top 10 Security Risk, “OW ASP top 10:2025—the ten most critical web application security risks,” 2025, accessed: 2026-05-21. [Online]. Available: https://owasp.org/Top10/2025/

2025
[36]

OW ASP top 10:2025—what are application security risks? (data factors),

OW ASP Data Factors, “OW ASP top 10:2025—what are application security risks? (data factors),” 2025, accessed: 2026-06-12. [Online]. Available: https://owasp.org/Top10/2025/0x02_2025-What_ are_Application_Security_Risks/#Data%20Factors

2025
[37]

FrameOps,

Mattias52, “FrameOps,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/Mattias52/FrameOps

2026
[38]

BoxCostPro,

AiBunty, “BoxCostPro,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/AiBunty/BoxCostPro

2026
[39]

[Online]

JunP1ayer, “fuyou,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/JunP1ayer/fuyou

2026
[40]

[Online]

SytheosAI, “IPC,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/SytheosAI/IPC

2026
[41]

Vecto-Pilot,

melodydashora, “Vecto-Pilot,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/melodydashora/ Vecto-Pilot

2026
[42]

Lonic-Flex-Claude-system,

levilonic, “Lonic-Flex-Claude-system,” GitHub repository, ac- cessed: 2026-05-20. [Online]. Available: https://github.com/levilonic/ Lonic-Flex-Claude-system

2026
[43]

[Online]

yanchen184, “Dune,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/yanchen184/Dune

2026
[44]

my-board-app,

kirikab-27, “my-board-app,” GitHub repository, accessed: 2026-05-

2026
[45]

Available: https://github.com/kirikab-27/my-board-app

[Online]. Available: https://github.com/kirikab-27/my-board-app
[46]

qart-nfc-production,

mizernaa, “qart-nfc-production,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/mizernaa/ qart-nfc-production

2026
[47]

PhantomOS,

ptengelmann, “PhantomOS,” GitHub repository, accessed: 2026-05-

2026
[48]

Available: https://github.com/ptengelmann/PhantomOS

[Online]. Available: https://github.com/ptengelmann/PhantomOS
[49]

Agent Skills: Security and Hardening,

Addy Osmani, “Agent Skills: Security and Hardening,” accessed: 2026-05-20. [Online]. Available: https://github.com/addyosmani/ agent-skills/blob/main/skills/security-and-hardening/SKILL.md Appendix A. Stage 1 Triage Rules Table 5 and Table 6 list the complete set of rules used by the Stage 1 triage step described in Section 3.2. A finding is discarded as ...

2026

[1] [1]

Towards ai-native software engineering (se 3.0): A vision and a challenge roadmap,

A. E. Hassan, G. A. Oliva, D. Lin, B. Chen, and Z. M. J. Jiang, “Towards ai-native software engineering (se 3.0): A vision and a challenge roadmap,”ACM Trans. Softw. Eng. Methodol., Apr. 2026, just Accepted. [Online]. Available: https://doi.org/10.1145/3807901

work page doi:10.1145/3807901 2026

[2] [2]

GitHub Copilot—Your AI pair programmer,

Microsoft, “GitHub Copilot—Your AI pair programmer,” accessed: 2026-05-20. [Online]. Available: https://github.com/features/copilot

2026

[3] [3]

Codex CLI,

OpenAI, “Codex CLI,” accessed: 2026-05-20. [Online]. Available: https://github.com/openai/codex

2026

[4] [4]

Lovable,

Lovable, “Lovable,” accessed: 2026-05-20. [Online]. Available: https://lovable.dev/

2026

[5] [5]

Claude Code,

Anthropic, “Claude Code,” accessed: 2026-05-20. [Online]. Available: https://claude.com/product/claude-code

2026

[6] [6]

There’s a new kind of coding I call “vibe coding

A. Karpathy, “There’s a new kind of coding I call “vibe coding”,” Post on X (formerly Twitter), Feb. 2025, accessed: 2026-05-20. [Online]. Available: https://x.com/karpathy/status/1886192184808149383

arXiv 2025

[7] [7]

Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,

S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,”arXiv preprint arXiv:2512.03262, 2025. [Online]. Available: https://arxiv.org/abs/2512.03262

arXiv 2025

[8] [8]

Lovable AI Statistics 2026—Users, Revenue, Adoption & Market Metrics,

Panto, “Lovable AI Statistics 2026—Users, Revenue, Adoption & Market Metrics,” accessed: 2026-05-20. [Online]. Available: https://www.getpanto.ai/blog/lovable-statistics

2026

[9] [9]

Lovable Security,

Lovable, “Lovable Security,” accessed: 2026-05-20. [Online]. Available: https://docs.lovable.dev/features/security

2026

[10] [10]

Claude Code Security,

Anthropic, “Claude Code Security,” accessed: 2026-05-20. [Online]. Available: https://code.claude.com/docs/en/security

2026

[11] [11]

Passing the security vibe check: The dangers of vibe coding,

Databricks, “Passing the security vibe check: The dangers of vibe coding,” 2025, accessed: 2026- 04-29. [Online]. Available: https://www.databricks.com/blog/ passing-security-vibe-check-dangers-vibe-coding

2025

[12] [12]

OW ASP top 10 for LLM applications 2025,

OW ASP Foundation, “OW ASP top 10 for LLM applications 2025,” 2025, accessed: 2026-04-29. [Online]. Available: https: //genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/

2025

[13] [13]

Vibe graveyard: Real-world security incidents in vibe-coded applications,

Vibe Graveyard, “Vibe graveyard: Real-world security incidents in vibe-coded applications,” 2025, accessed: 2026-04-29. [Online]. Available: https://www.vibegraveyard.ai/

2025

[14] [14]

Membership inference attacks from first principles

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 754–768. [Online]. Available: https://doi.org/10.1109/SP46214.2022.9833571

work page doi:10.1109/sp46214.2022.9833571 2022

[15] [15]

A.S.E: A repository-level benchmark for evaluating security in AI-generated code,

K. Lian, B. Wang, L. Zhang, L. Chen, J. Wang, Z. Zhao, Y . Yanget al., “A.S.E: A repository-level benchmark for evaluating security in AI-generated code,” 2025. [Online]. Available: https://arxiv.org/abs/2508.18106

arXiv 2025

[16] [16]

Securevibebench: Evaluating secure coding capabilities of code agents with realistic vulnerability scenarios,

J. Chen, H. Huang, Y . Lyu, J. An, J. Shi, C. Yang, T. Zhang, H. Tian, Y . Li, Z. Li, X. Zhou, X. Hu, and D. Lo, “Securevibebench: Evaluating secure coding capabilities of code agents with realistic vulnerability scenarios,” 2026. [Online]. Available: https://arxiv.org/ abs/2509.22097

Pith/arXiv arXiv 2026

[17] [17]

Zhang, H

Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “Autocoderover: Autonomous program improvement,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 1592–1604. [Online]. Available: https: //dl.acm.org/doi/10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024

[18] [18]

Swe-agent: Agent-computer interfaces enable automated software engineering,

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “Swe-agent: Agent-computer interfaces enable automated software engineering,”Advances in Neural Information Processing Systems, vol. 37, pp. 50 528–50 652, 2024. [Online]. Available: https://arxiv.org/abs/2405.15793

Pith/arXiv arXiv 2024

[19] [19]

Survey reveals AI’s impact on the developer experience,

I. Shani and GitHub Staff, “Survey reveals AI’s impact on the developer experience,” GitHub Blog (Research), 2023, accessed: 2026-05-20. [Online]. Available: https://github.blog/news-insights/ research/survey-reveals-ais-impact-on-the-developer-experience/

2023

[20] [20]

AI in software engineering at Google: Progress and the path ahead,

S. Chandra and M. Tabachnyk, “AI in software engineering at Google: Progress and the path ahead,” Google Research Blog, 2024, accessed: 2026-05-20. [Online]. Available: https://research.google/blog/ ai-in-software-engineering-at-google-progress-and-the-path-ahead/

2024

[21] [21]

Building software by rolling the dice: A qualitative study of vibe coding,

Y .-H. Chou, B. Jiang, Y . W. Chen, M. Weng, V . Jackson, T. Zimmermann, and J. A. Jones, “Building software by rolling the dice: A qualitative study of vibe coding,” inProceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2026, accepted; to appear. [Online]. Available: h...

arXiv 2026

[22] [22]

Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai,

Y . Nie, Z. Wang, Y . Yang, R. Jiang, Y . Tang, X. Davies, Y . Gal, B. Li, W. Guo, and D. Song, “Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai,” Advances in Neural Information Processing Systems, vol. 38, 2026. [Online]. Available: https://arxiv.org/abs/2410.11096

arXiv 2026

[23] [23]

How secure is code generated by chatgpt?

R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How secure is code generated by chatgpt?” in2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2023, pp. 2445–2451. [Online]. Available: https://arxiv.org/abs/2304.09655

Pith/arXiv arXiv 2023

[24] [24]

How secure is AI-generated code: A large-scale comparison of large language models,

N. Tihanyi, T. Bisztray, M. A. Ferrag, R. Jain, and L. C. Cordeiro, “How secure is AI-generated code: A large-scale comparison of large language models,”Empirical Software Engineering, vol. 30, no. 2, p. 47, 2025. [Online]. Available: https://doi.org/10.1007/ s10664-024-10590-1

2025

[25] [25]

ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

Y . Fu, P. Liang, A. Tahir, Z. Li, M. Shahin, J. Yu, and J. Chen, “Security weaknesses of copilot-generated code in github projects: An empirical study,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 8, Oct. 2025. [Online]. Available: https://doi.org/10.1145/3716848

work page doi:10.1145/3716848 2025

[26] [26]

Security vulnerabilities in AI-generated code: A large-scale analysis of public GitHub repositories,

M. Schreiber and P. Tippe, “Security vulnerabilities in AI-generated code: A large-scale analysis of public GitHub repositories,” in Information and Communications Security, ser. Lecture Notes in Computer Science, vol. 16219. Singapore: Springer Nature Singapore, 2026, pp. 153–172. [Online]. Available: https://doi.org/ 10.1007/978-981-95-3537-8_9

work page doi:10.1007/978-981-95-3537-8_9 2026

[27] [27]

LLM-CSEC: Empirical evaluation of security in C/C++ code generated by large language models,

M. U. Shahid, C. M. Ahmed, and R. Ranjan, “LLM-CSEC: Empirical evaluation of security in C/C++ code generated by large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2511.18966

arXiv 2025

[28] [28]

The hidden risks of llm-generated web application code: A security-centric evaluation of code generation capabilities in large language models,

S. Dora, D. Lunkad, N. Aslam, S. Venkatesan, and S. K. Shukla, “The hidden risks of llm-generated web application code: A security-centric evaluation of code generation capabilities in large language models,” inInternational Conference on Information Systems Security, 2025, pp. 27–37. [Online]. Available: https://arxiv.org/abs/2504.20612

arXiv 2025

[29] [29]

You still have to study on the security of LLM generated code,

A. Schaad, S. Götz, and D. Binder, “You still have to study on the security of llm generated code,” inICT Systems Security and Privacy Protection, L. Nemec Zlatolas, K. Rannenberg, T. Welzer, and J. Garcia-Alfaro, Eds., 2025, pp. 111–124. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-92886-4_8

work page doi:10.1007/978-3-031-92886-4_8 2025

[30] [30]

Do users write more insecure code with ai assistants?

N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do users write more insecure code with AI assistants?” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. ACM, Nov. 2023, pp. 2785–2799. [Online]. Available: https://doi.org/10.1145/3576915.3623157 14

work page doi:10.1145/3576915.3623157 2023

[31] [31]

BaxBench: Can LLMs generate correct and secure backends?

M. Vero, N. Mündler, V . Chibotaru, V . Raychev, M. Baader, N. Jovanovi ´c, J. He, and M. Vechev, “BaxBench: Can LLMs generate correct and secure backends?” inProceedings of the 42nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 267. PMLR, 13–19 Jul 2025, pp. 61 344–61 390. [Online]. Available: https://p...

2025

[32] [32]

Security Best Practices Skill,

OpenAI, “Security Best Practices Skill,” accessed: 2026-05-20. [Online]. Available: https://github.com/openai/skills/tree/main/skills/ .curated/security-best-practices

2026

[33] [33]

Antigravity Awesome Skills: Secu- rity Audit,

sickn33, “Antigravity Awesome Skills: Secu- rity Audit,” accessed: 2026-05-20. [Online]. Avail- able: https://github.com/sickn33/antigravity-awesome-skills/blob/ main/skills/security-audit/SKILL.md

2026

[34] [34]

OW ASP risk rating methodology,

OW ASP Foundation, “OW ASP risk rating methodology,” accessed: 2026-06-12. [Online]. Available: https://owasp.org/www-community/ OW ASP_Risk_Rating_Methodology

2026

[35] [35]

OW ASP top 10:2025—the ten most critical web application security risks,

OW ASP Top 10 Security Risk, “OW ASP top 10:2025—the ten most critical web application security risks,” 2025, accessed: 2026-05-21. [Online]. Available: https://owasp.org/Top10/2025/

2025

[36] [36]

OW ASP top 10:2025—what are application security risks? (data factors),

OW ASP Data Factors, “OW ASP top 10:2025—what are application security risks? (data factors),” 2025, accessed: 2026-06-12. [Online]. Available: https://owasp.org/Top10/2025/0x02_2025-What_ are_Application_Security_Risks/#Data%20Factors

2025

[37] [37]

FrameOps,

Mattias52, “FrameOps,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/Mattias52/FrameOps

2026

[38] [38]

BoxCostPro,

AiBunty, “BoxCostPro,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/AiBunty/BoxCostPro

2026

[39] [39]

[Online]

JunP1ayer, “fuyou,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/JunP1ayer/fuyou

2026

[40] [40]

[Online]

SytheosAI, “IPC,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/SytheosAI/IPC

2026

[41] [41]

Vecto-Pilot,

melodydashora, “Vecto-Pilot,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/melodydashora/ Vecto-Pilot

2026

[42] [42]

Lonic-Flex-Claude-system,

levilonic, “Lonic-Flex-Claude-system,” GitHub repository, ac- cessed: 2026-05-20. [Online]. Available: https://github.com/levilonic/ Lonic-Flex-Claude-system

2026

[43] [43]

[Online]

yanchen184, “Dune,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/yanchen184/Dune

2026

[44] [44]

my-board-app,

kirikab-27, “my-board-app,” GitHub repository, accessed: 2026-05-

2026

[45] [45]

Available: https://github.com/kirikab-27/my-board-app

[Online]. Available: https://github.com/kirikab-27/my-board-app

[46] [46]

qart-nfc-production,

mizernaa, “qart-nfc-production,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/mizernaa/ qart-nfc-production

2026

[47] [47]

PhantomOS,

ptengelmann, “PhantomOS,” GitHub repository, accessed: 2026-05-

2026

[48] [48]

Available: https://github.com/ptengelmann/PhantomOS

[Online]. Available: https://github.com/ptengelmann/PhantomOS

[49] [49]

Agent Skills: Security and Hardening,

Addy Osmani, “Agent Skills: Security and Hardening,” accessed: 2026-05-20. [Online]. Available: https://github.com/addyosmani/ agent-skills/blob/main/skills/security-and-hardening/SKILL.md Appendix A. Stage 1 Triage Rules Table 5 and Table 6 list the complete set of rules used by the Stage 1 triage step described in Section 3.2. A finding is discarded as ...

2026