Recognition: no theorem link
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
Pith reviewed 2026-05-13 19:56 UTC · model grok-4.3
The pith
Agent Skills frameworks carry structural security threats from missing data-instruction boundaries, single-approval trust, and unchecked marketplaces that incremental patches cannot resolve.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Agent Skills architecture creates attack surfaces in every phase of its lifecycle, and the resulting threat taxonomy of seven categories and seventeen scenarios shows that the gravest risks come from three structural properties: the absence of a data-instruction boundary, reliance on a single-approval persistent trust model, and the lack of mandatory marketplace security review. These properties are confirmed by analysis of five real incidents and cannot be removed by incremental mitigations alone.
What carries the argument
The four-phase lifecycle analysis that feeds the seven-category, seventeen-scenario threat taxonomy organized into three attack layers.
If this is right
- Any defense must target the three structural properties directly rather than adding layers around them.
- Marketplaces will need mandatory security review processes before distribution can be considered safe.
- Agent platforms must enforce data-instruction separation at execution time to close the largest class of threats.
- The taxonomy supplies a checklist that stakeholders can use to audit existing and future skills.
- Research on redesigning the trust and boundary mechanisms is required before the framework can support high-stakes uses.
Where Pith is reading between the lines
- The same boundary and trust issues likely appear in other modular skill or tool-packaging systems for LLM agents, suggesting a pattern worth checking across frameworks.
- Expanding the incident sample beyond the five reported cases could test whether the taxonomy needs additional categories.
- Requiring marketplace review would trade some speed of skill adoption for lower risk, an explicit cost that future designs must weigh.
- Without data-instruction separation, skills could serve as persistent vectors for prompt injection across sessions, an implication that extends to any agent that loads external modules.
Load-bearing premise
The architectural analysis captures the complete attack surface and the five incidents are representative enough to confirm the taxonomy covers the main risks.
What would settle it
An implemented mitigation that removes the identified threats while preserving the current data-instruction handling, single-approval trust model, and voluntary marketplace review would show the structural claim is incorrect.
Figures
read the original abstract
Agent Skills is an emerging open standard that defines a modular, filesystem-based packaging format enabling LLM-based agents to acquire domain-specific expertise on demand. Despite rapid adoption across multiple agentic platforms and the emergence of large community marketplaces, the security properties of Agent Skills have not been systematically studied. This paper presents the first comprehensive security analysis of the Agent Skills framework. We define the full lifecycle of an Agent Skill across four phases -- Creation, Distribution, Deployment, and Execution -- and identify the structural attack surface each phase introduces. Building on this lifecycle analysis, we construct a threat taxonomy comprising seven categories and seventeen scenarios organized across three attack layers, grounded in both architectural analysis and real-world evidence. We validate the taxonomy through analysis of five confirmed security incidents in the Agent Skills ecosystem. Based on these findings, we discuss defense directions for each threat category, identify open research challenges, and provide actionable recommendations for stakeholders. Our analysis reveals that the most severe threats arise from structural properties of the framework itself, including the absence of a data-instruction boundary, a single-approval persistent trust model, and the lack of mandatory marketplace security review, and cannot be addressed through incremental mitigations alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first comprehensive security analysis of the Agent Skills open standard for LLM-based agents. It defines a four-phase lifecycle (Creation, Distribution, Deployment, Execution), derives a threat taxonomy with seven categories and seventeen scenarios across three attack layers, validates it against five real-world incidents, and concludes that severe threats stem from inherent structural properties such as the absence of a data-instruction boundary and a single-approval trust model, which cannot be mitigated incrementally.
Significance. If the taxonomy holds, this establishes a foundational reference for security in agent skill ecosystems by linking architectural properties directly to threat categories and real incidents. It highlights the limits of incremental defenses and provides concrete recommendations for platforms and marketplaces, filling a gap in the literature on emerging agent standards.
major comments (1)
- [Validation section] Validation section: The taxonomy is validated by mapping five confirmed incidents to the seven categories, but the manuscript provides no explicit argument or enumeration showing why these incidents are representative of the full attack surface (e.g., unexamined marketplace or deployment scenarios). This weakens the load-bearing claim that the identified structural properties produce the most severe threats and cannot be addressed incrementally, as additional vectors could alter that assessment.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comment on the validation section. The feedback correctly identifies an opportunity to strengthen the link between the selected incidents and the broader claim about structural properties. We will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Validation section] Validation section: The taxonomy is validated by mapping five confirmed incidents to the seven categories, but the manuscript provides no explicit argument or enumeration showing why these incidents are representative of the full attack surface (e.g., unexamined marketplace or deployment scenarios). This weakens the load-bearing claim that the identified structural properties produce the most severe threats and cannot be addressed incrementally, as additional vectors could alter that assessment.
Authors: We agree that an explicit argument for representativeness was not sufficiently articulated. The five incidents were chosen because they collectively instantiate all seven threat categories and touch every phase of the four-phase lifecycle (Creation, Distribution, Deployment, Execution). In the revised version we will insert a new subsection (Validation: Coverage and Representativeness) that (1) provides a table mapping each incident to the specific categories and layers it exercises, (2) enumerates the marketplace and deployment vectors covered (including community marketplaces and production agent platforms), and (3) explains why the core structural weaknesses—absence of a data-instruction boundary, single-approval persistent trust, and lack of mandatory review—manifest across both examined and unexamined scenarios. We will also note remaining gaps (e.g., certain proprietary deployment environments) and argue that the structural nature of the threats makes it unlikely that additional vectors would invalidate the conclusion that incremental defenses are insufficient. This revision directly addresses the concern without altering the paper’s central claims. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper constructs its threat taxonomy directly from an independent architectural analysis of the four lifecycle phases (Creation, Distribution, Deployment, Execution) and grounds the seven categories and seventeen scenarios in external real-world incidents. No derivations, equations, or claims reduce by construction to fitted parameters, self-citations, or imported ansatzes; the conclusion that structural properties resist incremental mitigations follows from the defined attack surfaces and observed evidence without self-referential loops or load-bearing prior author work.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Agent Skills defines a modular filesystem-based packaging format with a four-phase lifecycle of Creation, Distribution, Deployment, and Execution.
- domain assumption The framework exhibits absence of a data-instruction boundary, single-approval persistent trust, and lack of mandatory marketplace security review.
Forward citations
Cited by 4 Pith papers
-
Sealing the Audit-Runtime Gap for LLM Skills
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
-
AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills
AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.
-
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
-
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
Reference graph
Works this paper leans on
-
[1]
The rise and potential of large language model based agents: A survey,
Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhouet al., “The rise and potential of large language model based agents: A survey, ”Science China Information Sciences, vol. 68, no. 2, p. 121101, 2025
work page 2025
-
[2]
A survey on large language model based autonomous agents,
L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Linet al., “A survey on large language model based autonomous agents, ”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024
work page 2024
-
[3]
OpenAI, “ChatGPT plugins, ” https://openai.com/blog/chatgpt-plugins, 2023
work page 2023
-
[4]
Model context protocol (mcp): Landscape, security threats, and future research directions,
X. Hou, Y. Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions, ”ACM Transactions on Software Engineering and Methodology, 2025
work page 2025
-
[5]
Agent Skills: Claude code documentation,
Anthropic, “Agent Skills: Claude code documentation, ” https://docs.anthropic.com/en/docs/claude-code/skills, 2025
work page 2025
-
[6]
Cursor, “Agent Skills | Cursor docs, ” https://cursor.com/docs/context/skills, 2025
work page 2025
-
[7]
About agent Skills — GitHub Copilot documentation,
GitHub, “About agent Skills — GitHub Copilot documentation, ” https://docs.github.com/en/copilot/concepts/agents/ about-agent-skills, 2025
work page 2025
-
[8]
Agent Skills | Gemini CLI documentation,
Google, “Agent Skills | Gemini CLI documentation, ” https://geminicli.com/docs/cli/skills/, 2026
work page 2026
-
[9]
I. Cherny, “Cato CTRL threat research: From productivity boost to ransomware nightmare — weaponizing Claude Skills with MedusaLocker, ” https://www.catonetworks.com/blog/cato-ctrl-weaponizing-claude-skills-with-medusalocker/, 2025
work page 2025
-
[10]
Y. Liu, Z. Chen, Y. Zhang, G. Deng, Y. Li, J. Ning, and L. Y. Zhang, “Malicious agent skills in the wild: A large-scale security empirical study, ”arXiv preprint arXiv:2602.06547, 2026
-
[11]
Snyk Security Research, “Snyk finds prompt injection in 36
-
[12]
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
Y. Liu, W. Wang, R. Feng, Y. Zhang, G. Xu, G. Deng, Y. Li, and L. Zhang, “Agent skills in the wild: An empirical study of security vulnerabilities at scale, ”arXiv preprint arXiv:2601.10338, 2026
work page internal anchor Pith review arXiv 2026
-
[13]
Agent skills enable a new class of realistic and trivially simple prompt injections,
D. Schmotz, S. Abdelnabi, and M. Andriushchenko, “Agent skills enable a new class of realistic and trivially simple prompt injections, ”arXiv preprint arXiv:2510.26328, 2025
-
[14]
D. Schmotz, L. Beurer-Kellner, S. Abdelnabi, and M. Andriushchenko, “Skill-inject: Measuring agent vulnerability to skill file attacks, ”arXiv preprint arXiv:2602.20156, 2026
-
[15]
Anthropic, “Use skills in claude, ” https://support.claude.com/en/articles/12512180-use-skills-in-claude, 2025
-
[16]
ChatGPT plugin review: Lessons learned,
CustomGPT, “ChatGPT plugin review: Lessons learned, ” https://customgpt.ai/chatgpt-plugin-review/, 2023
work page 2023
-
[17]
DataCamp, “ChatGPT plugins are no more, ” https://www.datacamp.com/blog/best-chat-gpt-plugins, 2024
work page 2024
-
[18]
Introducing the model context protocol,
Anthropic, “Introducing the model context protocol, ” https://www.anthropic.com/news/model-context-protocol, 2024
work page 2024
-
[19]
AI agent supply chain risk: Silent codebase exfiltration via skills,
Mitiga Labs, “AI agent supply chain risk: Silent codebase exfiltration via skills, ” https://www.mitiga.io/blog/ai-agent- supply-chain-risk-silent-codebase-exfiltration-via-skills, 2026
work page 2026
-
[20]
Authmind, “OpenClaw’s 230 malicious skills: What agentic AI supply chains teach us about the need to evolve identity security, ” https://www.authmind.com/blogs/openclaw-malicious-skills-agentic-ai-supply-chain, 2026
work page 2026
-
[21]
SafeDep Team, “Agent skills threat model, ” https://safedep.io/agent-skills-threat-model, 2026
work page 2026
-
[22]
Sok: Taxonomy of attacks on open-source software supply chains,
P. Ladisa, H. Plate, M. Martinez, and O. Barais, “Sok: Taxonomy of attacks on open-source software supply chains, ” in 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023, pp. 1509–1526
work page 2023
-
[23]
Agent skills spreading hallucinated npx commands,
Aikido Security, “Agent skills spreading hallucinated npx commands, ” https://www.aikido.dev/blog/agent-skills- spreading-hallucinated-npx-commands, 2026
work page 2026
-
[24]
OWASP top 10 for LLM apps & gen AI agentic security initiative,
OWASP GenAI Security Project, “OWASP top 10 for LLM apps & gen AI agentic security initiative, ” https://genai. owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/, 2025
work page 2026
-
[25]
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection, ” inProceedings of the 16th ACM workshop on J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026. 111:26 Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, an...
work page 2026
-
[26]
Agent skills: Explore security threats and controls,
Red Hat Developer, “Agent skills: Explore security threats and controls, ” https://developers.redhat.com/articles/2026/ 03/10/agent-skills-explore-security-threats-and-controls, 2026
work page 2026
-
[27]
Skill scanner: Security analysis of AI agent skills,
Cisco AI Defense, “Skill scanner: Security analysis of AI agent skills, ” https://github.com/cisco-ai-defense/skill-scanner, 2026
work page 2026
-
[28]
Caught in the hook: RCE and API token exfiltration through Claude Code project files,
A. Donenfeld and O. Vanunu, “Caught in the hook: RCE and API token exfiltration through Claude Code project files, ” https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve- 2025-59536/, 2026
work page 2026
-
[29]
Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases, ”Advances in Neural Information Processing Systems, vol. 37, pp. 130 185–130 213, 2024
work page 2024
- [30]
-
[31]
Trends and lessons from three years fighting malicious extensions,
N. Jagpal, E. Dingle, J.-P. Gravel, P. Mavrommatis, N. Provos, M. A. Rajab, and K. Thomas, “Trends and lessons from three years fighting malicious extensions, ” in24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 579–593
work page 2015
-
[32]
{StruQ}: Defending against prompt injection with structured queries,
S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “{StruQ}: Defending against prompt injection with structured queries, ” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400
work page 2025
-
[33]
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions, ”arXiv preprint arXiv:2404.13208, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
A survey on trustworthy llm agents: Threats and countermeasures,
M. Yu, F. Meng, X. Zhou, S. Wang, J. Mao, L. Pan, T. Chen, K. Wang, X. Li, Y. Zhanget al., “A survey on trustworthy llm agents: Threats and countermeasures, ” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 6216–6226
work page 2025
-
[35]
in-toto: Providing farm-to-table guarantees for bits and bytes,
S. Torres-Arias, H. Afzali, T. K. Kuppusamy, R. Curtmola, and J. Cappos, “in-toto: Providing farm-to-table guarantees for bits and bytes, ” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 1393–1410
work page 2019
-
[36]
Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills,
Snyk, “Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills, ” https://github.com/snyk/agent- scan, 2026
work page 2026
-
[37]
Skill scanner: Security scanner for agent skills,
Cisco AI Defense, “Skill scanner: Security scanner for agent skills, ” https://github.com/cisco-ai-defense/skill-scanner, 2026
work page 2026
-
[38]
Jailbroken: How does llm safety training fail?
A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does llm safety training fail?”Advances in neural information processing systems, vol. 36, pp. 80 079–80 110, 2023
work page 2023
-
[39]
Universal and Transferable Adversarial Attacks on Aligned Language Models
A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models, ”arXiv preprint arXiv:2307.15043, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
Jailbreaking black box large language models in twenty queries,
P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries, ” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2025, pp. 23–42
work page 2025
-
[41]
Tree of attacks: Jailbreaking black-box llms automatically,
A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box llms automatically, ”Advances in Neural Information Processing Systems, vol. 37, pp. 61 065–61 105, 2024
work page 2024
-
[42]
Ignore previous prompt: Attack techniques for language models,
F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models, ” inNeurIPS ML Safety Workshop, 2022
work page 2022
-
[43]
Promptlocate: Localizing prompt injection attacks,
Y. Jia, Y. Liu, Z. Shao, J. Jia, and N. Gong, “Promptlocate: Localizing prompt injection attacks, ”arXiv preprint arXiv:2510.12252, 2025
-
[44]
Extracting training data from large language models,
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models, ” in30th USENIX security symposium (USENIX Security 21), 2021, pp. 2633–2650
work page 2021
-
[45]
Hulk: Eliciting malicious behavior in browser extensions,
A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna, and V. Paxson, “Hulk: Eliciting malicious behavior in browser extensions, ” in23rd USENIX Security Symposium (USENIX Security 14), 2014, pp. 641–654
work page 2014
-
[46]
Developers are victims too: A comprehensive analysis of the vs code extension ecosystem,
S. Edirimannage, C. Elvitigala, A. K. K. Don, W. Daluwatta, P. Wijesekara, and I. Khalil, “Developers are victims too: A comprehensive analysis of the vs code extension ecosystem, ”arXiv preprint arXiv:2411.07479, 2024
-
[47]
Backstabber’s knife collection: A review of open source software supply chain attacks,
M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks, ” inInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2020, pp. 23–43
work page 2020
-
[48]
Mandiant, “Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with SUN- BURST backdoor, ” https://cloud.google.com/blog/topics/threat-intelligence/evasive-attacker-leverages-solarwinds- supply-chain-compromises-with-sunburst-backdoor, dec 2020
work page 2020
-
[49]
backdoor in upstream xz/liblzma leading to ssh server compromise,
A. Freund, “backdoor in upstream xz/liblzma leading to ssh server compromise, ” https://www.openwall.com/lists/oss- security/2024/03/29/4, mar 2024
work page 2024
-
[50]
Dependency confusion: How I hacked into Apple, Microsoft and dozens of other companies,
A. Birsan, “Dependency confusion: How I hacked into Apple, Microsoft and dozens of other companies, ”Medium, 2021
work page 2021
-
[51]
Y. Gan, Y. Yang, Z. Ma, P. He, R. Zeng, Y. Wang, Q. Li, C. Zhou, S. Li, T. Wanget al., “Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents, ”arXiv preprint arXiv:2411.09523, 2024. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Secu...
-
[52]
The emerged security and privacy of llm agent: A survey with case studies,
F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of llm agent: A survey with case studies, ”ACM Computing Surveys, vol. 58, no. 6, pp. 1–36, 2025
work page 2025
-
[53]
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
S. Datta, S. K. Nahin, A. Chhabra, and P. Mohapatra, “Agentic ai security: Threats, defenses, evaluation, and open challenges, ”arXiv preprint arXiv:2510.23883, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents,
Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents, ” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10 471–10 506
work page 2024
-
[55]
Formalizing and benchmarking prompt injection attacks and defenses,
Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses, ” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847
work page 2024
-
[56]
Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tramèr, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents, ”Advances in Neural Information Processing Systems, vol. 37, pp. 82 895–82 920, 2024
work page 2024
-
[57]
Watch out for your agents! investigating backdoor threats to llm-based agents,
W. Yang, X. Bi, Y. Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents, ”Advances in Neural Information Processing Systems, vol. 37, pp. 100 938–100 964, 2024. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.