Understanding the (In)Security of Vibe-Coded Applications
Pith reviewed 2026-06-26 08:04 UTC · model grok-4.3
The pith
Applications created mainly through natural-language chats with AI agents contain recurring vulnerabilities such as placeholder logic, unfiltered inputs, and exposed secrets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Vibe-coded applications exhibit recurring vulnerability patterns that differ from those in conventional software development, including placeholder logic, unfiltered input, and secret exposure; these arise from systematic limitations of AI agents throughout the lifecycle such as memory loss, locally optimized objectives, and insufficient security knowledge; advances in LLM capabilities and improved prompting reduce incidence but do not eliminate the risks.
What carries the argument
A vulnerability analysis framework that combines agent-assisted code auditing with human validation, applied to a corpus of real-world applications built with popular AI agents.
If this is right
- Vibe-coded apps display vulnerability patterns unlike those in traditional workflows.
- AI agent limitations in memory, objective focus, and security knowledge directly produce these patterns.
- Stronger LLMs and refined prompts lower but do not remove the security exposure.
Where Pith is reading between the lines
- Teams adopting vibe coding for production systems may need separate manual security gates that the AI process itself does not supply.
- The same AI limitations could affect other delegated tasks such as testing or documentation generation.
- Agent designs that treat security knowledge as a persistent global objective rather than a local one might shrink the gap.
Load-bearing premise
The chosen auditing framework accurately captures the true prevalence and root causes of vulnerabilities without missing real issues or adding false ones, and the collected apps represent typical practice.
What would settle it
Finding a sizable collection of vibe-coded applications that lack placeholder logic, unfiltered inputs, and secret exposures would challenge the claim of recurring distinct patterns.
Figures
read the original abstract
Recent advances in large language models (LLMs) have enabled vibe coding, an emerging software development paradigm in which users create applications primarily through natural-language interactions with AI agents. Due to its low barrier to entry, vibe coding is rapidly gaining adoption in practice. Unlike conventional AI-assisted programming, where developers remain responsible for implementation and code review, vibe coding delegates a substantial portion of development to AI systems. This shift raises a fundamental question: how (in)secure are applications developed through vibe coding? In this paper, we conduct a systematic study of the security of vibe-coded applications. We collect a large corpus of real-world applications developed using popular AI agents and design a vulnerability analysis framework that combines agent-assisted code auditing with human validation. Using this framework, we examine the prevalence, severity, and root causes of vulnerabilities in the deployed vibe-coded applications. Our study reveals several key findings: (1) vibe-coded applications exhibit recurring vulnerability patterns that differ from those commonly observed in conventional software development workflows, including placeholder logic, unfiltered input, and secret exposure; (2) these vulnerabilities arise from systematic limitations of AI agents throughout the vibe-coding lifecycle, such as memory loss, locally optimized objectives and insufficient security knowledge; and (3) while advances in LLM capabilities and improved prompting strategies can reduce the incidence of vulnerabilities, they do not eliminate the underlying security risks. Overall, our study provides an empirical understanding of the security landscape of vibe-coded applications and lays the groundwork for addressing the security challenges introduced by the growing delegation of software development to AI systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports on a systematic empirical study of the security of vibe-coded applications, defined as software developed primarily through natural-language interactions with AI agents rather than conventional coding workflows. The authors collect a corpus of real-world applications built with popular AI agents, introduce a vulnerability analysis framework that integrates agent-assisted code auditing with human validation, and examine the prevalence, severity, and root causes of vulnerabilities in deployed applications. The central claims are that vibe-coded apps exhibit recurring vulnerability patterns distinct from conventional development (e.g., placeholder logic, unfiltered input, secret exposure); that these stem from systematic AI-agent limitations across the development lifecycle (memory loss, locally optimized objectives, insufficient security knowledge); and that while advances in LLM capabilities and prompting strategies can reduce incidence, they do not eliminate the underlying risks.
Significance. If the empirical results and analysis framework prove robust, the work would be significant for the security community because it addresses an emerging, low-barrier development paradigm whose adoption is increasing rapidly. It supplies the first large-scale observational data on how delegation of implementation to AI agents introduces new classes of vulnerabilities and identifies actionable root causes tied to current LLM limitations. The mixed agent-plus-human auditing approach is a methodological strength that could be adopted in future studies of AI-generated code.
major comments (3)
- [Abstract] Abstract: the study design and high-level findings are described, but the abstract supplies no quantitative results (corpus size, vulnerability counts, severity distributions, statistical measures, or inter-rater agreement for human validation). Without these data it is impossible to determine whether the observed patterns support the three numbered claims.
- [Methodology / Vulnerability analysis framework] Vulnerability analysis framework section: the claim that the combined agent-assisted auditing plus human validation accurately identifies true prevalence and root causes rests on the untested assumption that false-positive and false-negative rates are low; the manuscript must report concrete validation metrics (e.g., precision/recall on a labeled subset or disagreement rates between agent and human auditors) because this is load-bearing for all prevalence and causality statements.
- [Corpus collection] Corpus description: the representativeness of the collected applications for typical vibe-coded practice is asserted but not demonstrated; the paper should report inclusion criteria, total number of apps examined, distribution across domains and AI agents used, and any filtering steps, as these directly affect the generalizability of the recurring-pattern findings.
minor comments (2)
- [Abstract] Abstract: the phrase 'large corpus' is used without a number; adding even a single sentence with approximate scale would improve readability.
- [Introduction] Terminology: 'vibe coding' and 'vibe-coded' are introduced without a crisp definition or citation to prior usage; a short definitional paragraph early in the introduction would help readers unfamiliar with the term.
Simulated Author's Rebuttal
Thank you for the referee's constructive feedback and recommendation for major revision. We address each major comment point-by-point below, agreeing where the manuscript can be strengthened through revision.
read point-by-point responses
-
Referee: [Abstract] Abstract: the study design and high-level findings are described, but the abstract supplies no quantitative results (corpus size, vulnerability counts, severity distributions, statistical measures, or inter-rater agreement for human validation). Without these data it is impossible to determine whether the observed patterns support the three numbered claims.
Authors: We agree that the abstract would benefit from quantitative results to allow readers to assess the claims directly. In the revised version we will incorporate key statistics including corpus size, vulnerability counts and severity distributions, and inter-rater agreement for the human validation step. revision: yes
-
Referee: [Methodology / Vulnerability analysis framework] Vulnerability analysis framework section: the claim that the combined agent-assisted auditing plus human validation accurately identifies true prevalence and root causes rests on the untested assumption that false-positive and false-negative rates are low; the manuscript must report concrete validation metrics (e.g., precision/recall on a labeled subset or disagreement rates between agent and human auditors) because this is load-bearing for all prevalence and causality statements.
Authors: This point is well taken; reporting concrete validation metrics is necessary to support the framework's reliability. We will add a dedicated validation subsection that reports precision/recall on a labeled subset and disagreement rates between agent and human auditors. revision: yes
-
Referee: [Corpus collection] Corpus description: the representativeness of the collected applications for typical vibe-coded practice is asserted but not demonstrated; the paper should report inclusion criteria, total number of apps examined, distribution across domains and AI agents used, and any filtering steps, as these directly affect the generalizability of the recurring-pattern findings.
Authors: We agree that explicit details on corpus construction are required for evaluating generalizability. The revised manuscript will expand the corpus collection section to include inclusion criteria, the total number of applications examined, domain and agent distributions, and all filtering steps. revision: yes
Circularity Check
No significant circularity identified
full rationale
This is a purely empirical observational study that collects a corpus of real-world vibe-coded applications and applies an agent-assisted auditing framework with human validation to identify vulnerability patterns. No equations, derivations, fitted parameters, or self-citation chains appear in the abstract or described methodology; all claims rest on direct analysis of external applications rather than reducing to prior results by construction. The study is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The corpus of real-world applications developed using popular AI agents is representative of vibe-coded applications in practice.
- domain assumption The vulnerability analysis framework combining agent-assisted code auditing with human validation correctly identifies prevalence, severity, and root causes.
Reference graph
Works this paper leans on
-
[1]
Towards ai-native software engineering (se 3.0): A vision and a challenge roadmap,
A. E. Hassan, G. A. Oliva, D. Lin, B. Chen, and Z. M. J. Jiang, “Towards ai-native software engineering (se 3.0): A vision and a challenge roadmap,”ACM Trans. Softw. Eng. Methodol., Apr. 2026, just Accepted. [Online]. Available: https://doi.org/10.1145/3807901
-
[2]
GitHub Copilot—Your AI pair programmer,
Microsoft, “GitHub Copilot—Your AI pair programmer,” accessed: 2026-05-20. [Online]. Available: https://github.com/features/copilot
2026
-
[3]
Codex CLI,
OpenAI, “Codex CLI,” accessed: 2026-05-20. [Online]. Available: https://github.com/openai/codex
2026
-
[4]
Lovable,
Lovable, “Lovable,” accessed: 2026-05-20. [Online]. Available: https://lovable.dev/
2026
-
[5]
Claude Code,
Anthropic, “Claude Code,” accessed: 2026-05-20. [Online]. Available: https://claude.com/product/claude-code
2026
-
[6]
There’s a new kind of coding I call “vibe coding
A. Karpathy, “There’s a new kind of coding I call “vibe coding”,” Post on X (formerly Twitter), Feb. 2025, accessed: 2026-05-20. [Online]. Available: https://x.com/karpathy/status/1886192184808149383
arXiv 2025
-
[7]
Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,
S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,”arXiv preprint arXiv:2512.03262, 2025. [Online]. Available: https://arxiv.org/abs/2512.03262
arXiv 2025
-
[8]
Lovable AI Statistics 2026—Users, Revenue, Adoption & Market Metrics,
Panto, “Lovable AI Statistics 2026—Users, Revenue, Adoption & Market Metrics,” accessed: 2026-05-20. [Online]. Available: https://www.getpanto.ai/blog/lovable-statistics
2026
-
[9]
Lovable Security,
Lovable, “Lovable Security,” accessed: 2026-05-20. [Online]. Available: https://docs.lovable.dev/features/security
2026
-
[10]
Claude Code Security,
Anthropic, “Claude Code Security,” accessed: 2026-05-20. [Online]. Available: https://code.claude.com/docs/en/security
2026
-
[11]
Passing the security vibe check: The dangers of vibe coding,
Databricks, “Passing the security vibe check: The dangers of vibe coding,” 2025, accessed: 2026- 04-29. [Online]. Available: https://www.databricks.com/blog/ passing-security-vibe-check-dangers-vibe-coding
2025
-
[12]
OW ASP top 10 for LLM applications 2025,
OW ASP Foundation, “OW ASP top 10 for LLM applications 2025,” 2025, accessed: 2026-04-29. [Online]. Available: https: //genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
2025
-
[13]
Vibe graveyard: Real-world security incidents in vibe-coded applications,
Vibe Graveyard, “Vibe graveyard: Real-world security incidents in vibe-coded applications,” 2025, accessed: 2026-04-29. [Online]. Available: https://www.vibegraveyard.ai/
2025
-
[14]
Membership inference attacks from first principles
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 754–768. [Online]. Available: https://doi.org/10.1109/SP46214.2022.9833571
-
[15]
A.S.E: A repository-level benchmark for evaluating security in AI-generated code,
K. Lian, B. Wang, L. Zhang, L. Chen, J. Wang, Z. Zhao, Y . Yanget al., “A.S.E: A repository-level benchmark for evaluating security in AI-generated code,” 2025. [Online]. Available: https://arxiv.org/abs/2508.18106
arXiv 2025
-
[16]
J. Chen, H. Huang, Y . Lyu, J. An, J. Shi, C. Yang, T. Zhang, H. Tian, Y . Li, Z. Li, X. Zhou, X. Hu, and D. Lo, “Securevibebench: Evaluating secure coding capabilities of code agents with realistic vulnerability scenarios,” 2026. [Online]. Available: https://arxiv.org/ abs/2509.22097
Pith/arXiv arXiv 2026
-
[17]
Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “Autocoderover: Autonomous program improvement,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 1592–1604. [Online]. Available: https: //dl.acm.org/doi/10.1145/3650212.3680384
-
[18]
Swe-agent: Agent-computer interfaces enable automated software engineering,
J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “Swe-agent: Agent-computer interfaces enable automated software engineering,”Advances in Neural Information Processing Systems, vol. 37, pp. 50 528–50 652, 2024. [Online]. Available: https://arxiv.org/abs/2405.15793
Pith/arXiv arXiv 2024
-
[19]
Survey reveals AI’s impact on the developer experience,
I. Shani and GitHub Staff, “Survey reveals AI’s impact on the developer experience,” GitHub Blog (Research), 2023, accessed: 2026-05-20. [Online]. Available: https://github.blog/news-insights/ research/survey-reveals-ais-impact-on-the-developer-experience/
2023
-
[20]
AI in software engineering at Google: Progress and the path ahead,
S. Chandra and M. Tabachnyk, “AI in software engineering at Google: Progress and the path ahead,” Google Research Blog, 2024, accessed: 2026-05-20. [Online]. Available: https://research.google/blog/ ai-in-software-engineering-at-google-progress-and-the-path-ahead/
2024
-
[21]
Building software by rolling the dice: A qualitative study of vibe coding,
Y .-H. Chou, B. Jiang, Y . W. Chen, M. Weng, V . Jackson, T. Zimmermann, and J. A. Jones, “Building software by rolling the dice: A qualitative study of vibe coding,” inProceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2026, accepted; to appear. [Online]. Available: h...
arXiv 2026
-
[22]
Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai,
Y . Nie, Z. Wang, Y . Yang, R. Jiang, Y . Tang, X. Davies, Y . Gal, B. Li, W. Guo, and D. Song, “Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai,” Advances in Neural Information Processing Systems, vol. 38, 2026. [Online]. Available: https://arxiv.org/abs/2410.11096
arXiv 2026
-
[23]
How secure is code generated by chatgpt?
R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How secure is code generated by chatgpt?” in2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2023, pp. 2445–2451. [Online]. Available: https://arxiv.org/abs/2304.09655
Pith/arXiv arXiv 2023
-
[24]
How secure is AI-generated code: A large-scale comparison of large language models,
N. Tihanyi, T. Bisztray, M. A. Ferrag, R. Jain, and L. C. Cordeiro, “How secure is AI-generated code: A large-scale comparison of large language models,”Empirical Software Engineering, vol. 30, no. 2, p. 47, 2025. [Online]. Available: https://doi.org/10.1007/ s10664-024-10590-1
2025
-
[25]
Y . Fu, P. Liang, A. Tahir, Z. Li, M. Shahin, J. Yu, and J. Chen, “Security weaknesses of copilot-generated code in github projects: An empirical study,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 8, Oct. 2025. [Online]. Available: https://doi.org/10.1145/3716848
-
[26]
Security vulnerabilities in AI-generated code: A large-scale analysis of public GitHub repositories,
M. Schreiber and P. Tippe, “Security vulnerabilities in AI-generated code: A large-scale analysis of public GitHub repositories,” in Information and Communications Security, ser. Lecture Notes in Computer Science, vol. 16219. Singapore: Springer Nature Singapore, 2026, pp. 153–172. [Online]. Available: https://doi.org/ 10.1007/978-981-95-3537-8_9
-
[27]
LLM-CSEC: Empirical evaluation of security in C/C++ code generated by large language models,
M. U. Shahid, C. M. Ahmed, and R. Ranjan, “LLM-CSEC: Empirical evaluation of security in C/C++ code generated by large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2511.18966
arXiv 2025
-
[28]
S. Dora, D. Lunkad, N. Aslam, S. Venkatesan, and S. K. Shukla, “The hidden risks of llm-generated web application code: A security-centric evaluation of code generation capabilities in large language models,” inInternational Conference on Information Systems Security, 2025, pp. 27–37. [Online]. Available: https://arxiv.org/abs/2504.20612
arXiv 2025
-
[29]
You still have to study on the security of LLM generated code,
A. Schaad, S. Götz, and D. Binder, “You still have to study on the security of llm generated code,” inICT Systems Security and Privacy Protection, L. Nemec Zlatolas, K. Rannenberg, T. Welzer, and J. Garcia-Alfaro, Eds., 2025, pp. 111–124. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-92886-4_8
-
[30]
Do users write more insecure code with ai assistants?
N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do users write more insecure code with AI assistants?” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. ACM, Nov. 2023, pp. 2785–2799. [Online]. Available: https://doi.org/10.1145/3576915.3623157 14
-
[31]
BaxBench: Can LLMs generate correct and secure backends?
M. Vero, N. Mündler, V . Chibotaru, V . Raychev, M. Baader, N. Jovanovi ´c, J. He, and M. Vechev, “BaxBench: Can LLMs generate correct and secure backends?” inProceedings of the 42nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 267. PMLR, 13–19 Jul 2025, pp. 61 344–61 390. [Online]. Available: https://p...
2025
-
[32]
Security Best Practices Skill,
OpenAI, “Security Best Practices Skill,” accessed: 2026-05-20. [Online]. Available: https://github.com/openai/skills/tree/main/skills/ .curated/security-best-practices
2026
-
[33]
Antigravity Awesome Skills: Secu- rity Audit,
sickn33, “Antigravity Awesome Skills: Secu- rity Audit,” accessed: 2026-05-20. [Online]. Avail- able: https://github.com/sickn33/antigravity-awesome-skills/blob/ main/skills/security-audit/SKILL.md
2026
-
[34]
OW ASP risk rating methodology,
OW ASP Foundation, “OW ASP risk rating methodology,” accessed: 2026-06-12. [Online]. Available: https://owasp.org/www-community/ OW ASP_Risk_Rating_Methodology
2026
-
[35]
OW ASP top 10:2025—the ten most critical web application security risks,
OW ASP Top 10 Security Risk, “OW ASP top 10:2025—the ten most critical web application security risks,” 2025, accessed: 2026-05-21. [Online]. Available: https://owasp.org/Top10/2025/
2025
-
[36]
OW ASP top 10:2025—what are application security risks? (data factors),
OW ASP Data Factors, “OW ASP top 10:2025—what are application security risks? (data factors),” 2025, accessed: 2026-06-12. [Online]. Available: https://owasp.org/Top10/2025/0x02_2025-What_ are_Application_Security_Risks/#Data%20Factors
2025
-
[37]
FrameOps,
Mattias52, “FrameOps,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/Mattias52/FrameOps
2026
-
[38]
BoxCostPro,
AiBunty, “BoxCostPro,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/AiBunty/BoxCostPro
2026
-
[39]
[Online]
JunP1ayer, “fuyou,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/JunP1ayer/fuyou
2026
-
[40]
[Online]
SytheosAI, “IPC,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/SytheosAI/IPC
2026
-
[41]
Vecto-Pilot,
melodydashora, “Vecto-Pilot,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/melodydashora/ Vecto-Pilot
2026
-
[42]
Lonic-Flex-Claude-system,
levilonic, “Lonic-Flex-Claude-system,” GitHub repository, ac- cessed: 2026-05-20. [Online]. Available: https://github.com/levilonic/ Lonic-Flex-Claude-system
2026
-
[43]
[Online]
yanchen184, “Dune,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/yanchen184/Dune
2026
-
[44]
my-board-app,
kirikab-27, “my-board-app,” GitHub repository, accessed: 2026-05-
2026
-
[45]
Available: https://github.com/kirikab-27/my-board-app
[Online]. Available: https://github.com/kirikab-27/my-board-app
-
[46]
qart-nfc-production,
mizernaa, “qart-nfc-production,” GitHub repository, accessed: 2026-05-20. [Online]. Available: https://github.com/mizernaa/ qart-nfc-production
2026
-
[47]
PhantomOS,
ptengelmann, “PhantomOS,” GitHub repository, accessed: 2026-05-
2026
-
[48]
Available: https://github.com/ptengelmann/PhantomOS
[Online]. Available: https://github.com/ptengelmann/PhantomOS
-
[49]
Agent Skills: Security and Hardening,
Addy Osmani, “Agent Skills: Security and Hardening,” accessed: 2026-05-20. [Online]. Available: https://github.com/addyosmani/ agent-skills/blob/main/skills/security-and-hardening/SKILL.md Appendix A. Stage 1 Triage Rules Table 5 and Table 6 list the complete set of rules used by the Stage 1 triage step described in Section 3.2. A finding is discarded as ...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.