arxiv: 2604.02837 · v1 · submitted 2026-04-03 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

Zhiyuan Li , Jingzheng Wu , Xiang Ling , Xing Cui , Tianyue Luo

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:56 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords Agent Skillssecurity analysisthreat taxonomyLLM agentsagent securityframework vulnerabilitiesmarketplace securitysupply chain attacks

0 comments

The pith

Agent Skills frameworks carry structural security threats from missing data-instruction boundaries, single-approval trust, and unchecked marketplaces that incremental patches cannot resolve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper maps the full lifecycle of an Agent Skill through creation, distribution, deployment, and execution to expose where each phase opens attack surfaces. It organizes those surfaces into a taxonomy of seven categories and seventeen scenarios across three layers, then checks the taxonomy against five confirmed incidents. The central finding is that the framework's built-in choices—no separation between data and instructions, a trust model that approves once and keeps that approval, and marketplaces without required security checks—create risks that stay even after ordinary fixes. A reader should care because Agent Skills already spreads across multiple agent platforms and community stores, so these properties could scale exposure to prompt-style and supply-chain attacks. The work ends by listing defense directions and open challenges that follow from treating the flaws as architectural rather than local.

Core claim

The Agent Skills architecture creates attack surfaces in every phase of its lifecycle, and the resulting threat taxonomy of seven categories and seventeen scenarios shows that the gravest risks come from three structural properties: the absence of a data-instruction boundary, reliance on a single-approval persistent trust model, and the lack of mandatory marketplace security review. These properties are confirmed by analysis of five real incidents and cannot be removed by incremental mitigations alone.

What carries the argument

The four-phase lifecycle analysis that feeds the seven-category, seventeen-scenario threat taxonomy organized into three attack layers.

If this is right

Any defense must target the three structural properties directly rather than adding layers around them.
Marketplaces will need mandatory security review processes before distribution can be considered safe.
Agent platforms must enforce data-instruction separation at execution time to close the largest class of threats.
The taxonomy supplies a checklist that stakeholders can use to audit existing and future skills.
Research on redesigning the trust and boundary mechanisms is required before the framework can support high-stakes uses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same boundary and trust issues likely appear in other modular skill or tool-packaging systems for LLM agents, suggesting a pattern worth checking across frameworks.
Expanding the incident sample beyond the five reported cases could test whether the taxonomy needs additional categories.
Requiring marketplace review would trade some speed of skill adoption for lower risk, an explicit cost that future designs must weigh.
Without data-instruction separation, skills could serve as persistent vectors for prompt injection across sessions, an implication that extends to any agent that loads external modules.

Load-bearing premise

The architectural analysis captures the complete attack surface and the five incidents are representative enough to confirm the taxonomy covers the main risks.

What would settle it

An implemented mitigation that removes the identified threats while preserving the current data-instruction handling, single-approval trust model, and voluntary marketplace review would show the structural claim is incorrect.

Figures

Figures reproduced from arXiv: 2604.02837 by Jingzheng Wu, Tianyue Luo, Xiang Ling, Xing Cui, Zhiyuan Li.

**Figure 2.** Figure 2: The Agent Skills lifecycle and threat taxonomy. The horizontal axis represents the four lifecycle phases; [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: The MedusaLocker Ransomware attack. The user-visible layer shows normal GIF creation behavior, [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

Agent Skills is an emerging open standard that defines a modular, filesystem-based packaging format enabling LLM-based agents to acquire domain-specific expertise on demand. Despite rapid adoption across multiple agentic platforms and the emergence of large community marketplaces, the security properties of Agent Skills have not been systematically studied. This paper presents the first comprehensive security analysis of the Agent Skills framework. We define the full lifecycle of an Agent Skill across four phases -- Creation, Distribution, Deployment, and Execution -- and identify the structural attack surface each phase introduces. Building on this lifecycle analysis, we construct a threat taxonomy comprising seven categories and seventeen scenarios organized across three attack layers, grounded in both architectural analysis and real-world evidence. We validate the taxonomy through analysis of five confirmed security incidents in the Agent Skills ecosystem. Based on these findings, we discuss defense directions for each threat category, identify open research challenges, and provide actionable recommendations for stakeholders. Our analysis reveals that the most severe threats arise from structural properties of the framework itself, including the absence of a data-instruction boundary, a single-approval persistent trust model, and the lack of mandatory marketplace security review, and cannot be addressed through incremental mitigations alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first lifecycle-based threat taxonomy for Agent Skills, tied to its actual architecture and checked against five incidents, but the structural diagnosis rests on that taxonomy being complete.

read the letter

The main thing to know is that this is the first systematic security analysis of the Agent Skills standard. It maps threats across the four phases of Creation, Distribution, Deployment, and Execution, producing seven categories and seventeen scenarios. The authors ground the taxonomy in the framework's filesystem packaging and marketplace model, then cross-check it with five real incidents. That gives the work concrete footing instead of generic checklists, and the discussion of defenses and open challenges is straightforward and usable for platform builders.

Referee Report

1 major / 0 minor

Summary. The manuscript presents the first comprehensive security analysis of the Agent Skills open standard for LLM-based agents. It defines a four-phase lifecycle (Creation, Distribution, Deployment, Execution), derives a threat taxonomy with seven categories and seventeen scenarios across three attack layers, validates it against five real-world incidents, and concludes that severe threats stem from inherent structural properties such as the absence of a data-instruction boundary and a single-approval trust model, which cannot be mitigated incrementally.

Significance. If the taxonomy holds, this establishes a foundational reference for security in agent skill ecosystems by linking architectural properties directly to threat categories and real incidents. It highlights the limits of incremental defenses and provides concrete recommendations for platforms and marketplaces, filling a gap in the literature on emerging agent standards.

major comments (1)

[Validation section] Validation section: The taxonomy is validated by mapping five confirmed incidents to the seven categories, but the manuscript provides no explicit argument or enumeration showing why these incidents are representative of the full attack surface (e.g., unexamined marketplace or deployment scenarios). This weakens the load-bearing claim that the identified structural properties produce the most severe threats and cannot be addressed incrementally, as additional vectors could alter that assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive comment on the validation section. The feedback correctly identifies an opportunity to strengthen the link between the selected incidents and the broader claim about structural properties. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Validation section] Validation section: The taxonomy is validated by mapping five confirmed incidents to the seven categories, but the manuscript provides no explicit argument or enumeration showing why these incidents are representative of the full attack surface (e.g., unexamined marketplace or deployment scenarios). This weakens the load-bearing claim that the identified structural properties produce the most severe threats and cannot be addressed incrementally, as additional vectors could alter that assessment.

Authors: We agree that an explicit argument for representativeness was not sufficiently articulated. The five incidents were chosen because they collectively instantiate all seven threat categories and touch every phase of the four-phase lifecycle (Creation, Distribution, Deployment, Execution). In the revised version we will insert a new subsection (Validation: Coverage and Representativeness) that (1) provides a table mapping each incident to the specific categories and layers it exercises, (2) enumerates the marketplace and deployment vectors covered (including community marketplaces and production agent platforms), and (3) explains why the core structural weaknesses—absence of a data-instruction boundary, single-approval persistent trust, and lack of mandatory review—manifest across both examined and unexamined scenarios. We will also note remaining gaps (e.g., certain proprietary deployment environments) and argue that the structural nature of the threats makes it unlikely that additional vectors would invalidate the conclusion that incremental defenses are insufficient. This revision directly addresses the concern without altering the paper’s central claims. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper constructs its threat taxonomy directly from an independent architectural analysis of the four lifecycle phases (Creation, Distribution, Deployment, Execution) and grounds the seven categories and seventeen scenarios in external real-world incidents. No derivations, equations, or claims reduce by construction to fitted parameters, self-citations, or imported ansatzes; the conclusion that structural properties resist incremental mitigations follows from the defined attack surfaces and observed evidence without self-referential loops or load-bearing prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions about the Agent Skills architecture and the representativeness of observed incidents rather than on fitted parameters or new postulated entities.

axioms (2)

domain assumption Agent Skills defines a modular filesystem-based packaging format with a four-phase lifecycle of Creation, Distribution, Deployment, and Execution.
This lifecycle is used to identify the structural attack surface in each phase.
domain assumption The framework exhibits absence of a data-instruction boundary, single-approval persistent trust, and lack of mandatory marketplace security review.
These properties are identified as the root causes of the most severe threats.

pith-pipeline@v0.9.0 · 5511 in / 1401 out tokens · 48363 ms · 2026-05-13T19:56:57.198911+00:00 · methodology

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sealing the Audit-Runtime Gap for LLM Skills
cs.CR 2026-05 unverdicted novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills
cs.CR 2026-05 conditional novelty 6.0

AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
cs.CR 2026-05 unverdicted novelty 6.0

SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills
cs.CL 2026-04 unverdicted novelty 6.0

SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 4 Pith papers · 4 internal anchors

[1]

The rise and potential of large language model based agents: A survey,

Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhouet al., “The rise and potential of large language model based agents: A survey, ”Science China Information Sciences, vol. 68, no. 2, p. 121101, 2025

work page 2025
[2]

A survey on large language model based autonomous agents,

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Linet al., “A survey on large language model based autonomous agents, ”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

work page 2024
[3]

ChatGPT plugins,

OpenAI, “ChatGPT plugins, ” https://openai.com/blog/chatgpt-plugins, 2023

work page 2023
[4]

Model context protocol (mcp): Landscape, security threats, and future research directions,

X. Hou, Y. Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions, ”ACM Transactions on Software Engineering and Methodology, 2025

work page 2025
[5]

Agent Skills: Claude code documentation,

Anthropic, “Agent Skills: Claude code documentation, ” https://docs.anthropic.com/en/docs/claude-code/skills, 2025

work page 2025
[6]

Agent Skills | Cursor docs,

Cursor, “Agent Skills | Cursor docs, ” https://cursor.com/docs/context/skills, 2025

work page 2025
[7]

About agent Skills — GitHub Copilot documentation,

GitHub, “About agent Skills — GitHub Copilot documentation, ” https://docs.github.com/en/copilot/concepts/agents/ about-agent-skills, 2025

work page 2025
[8]

Agent Skills | Gemini CLI documentation,

Google, “Agent Skills | Gemini CLI documentation, ” https://geminicli.com/docs/cli/skills/, 2026

work page 2026
[9]

Cato CTRL threat research: From productivity boost to ransomware nightmare — weaponizing Claude Skills with MedusaLocker,

I. Cherny, “Cato CTRL threat research: From productivity boost to ransomware nightmare — weaponizing Claude Skills with MedusaLocker, ” https://www.catonetworks.com/blog/cato-ctrl-weaponizing-claude-skills-with-medusalocker/, 2025

work page 2025
[10]

Malicious agent skills in the wild: A large-scale security empirical study.arXiv preprint arXiv:2602.06547, 2026

Y. Liu, Z. Chen, Y. Zhang, G. Deng, Y. Li, J. Ning, and L. Y. Zhang, “Malicious agent skills in the wild: A large-scale security empirical study, ”arXiv preprint arXiv:2602.06547, 2026

work page arXiv 2026
[11]

Snyk Security Research, “Snyk finds prompt injection in 36

work page
[12]

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Y. Liu, W. Wang, R. Feng, Y. Zhang, G. Xu, G. Deng, Y. Li, and L. Zhang, “Agent skills in the wild: An empirical study of security vulnerabilities at scale, ”arXiv preprint arXiv:2601.10338, 2026

work page internal anchor Pith review arXiv 2026
[13]

Agent skills enable a new class of realistic and trivially simple prompt injections,

D. Schmotz, S. Abdelnabi, and M. Andriushchenko, “Agent skills enable a new class of realistic and trivially simple prompt injections, ”arXiv preprint arXiv:2510.26328, 2025

work page arXiv 2025
[14]

Skill-inject: Measuring agent vulnerability to skill file attacks.arXiv preprint arXiv:2602.20156, 2026

D. Schmotz, L. Beurer-Kellner, S. Abdelnabi, and M. Andriushchenko, “Skill-inject: Measuring agent vulnerability to skill file attacks, ”arXiv preprint arXiv:2602.20156, 2026

work page arXiv 2026
[15]

Use skills in claude,

Anthropic, “Use skills in claude, ” https://support.claude.com/en/articles/12512180-use-skills-in-claude, 2025

work page arXiv 2025
[16]

ChatGPT plugin review: Lessons learned,

CustomGPT, “ChatGPT plugin review: Lessons learned, ” https://customgpt.ai/chatgpt-plugin-review/, 2023

work page 2023
[17]

ChatGPT plugins are no more,

DataCamp, “ChatGPT plugins are no more, ” https://www.datacamp.com/blog/best-chat-gpt-plugins, 2024

work page 2024
[18]

Introducing the model context protocol,

Anthropic, “Introducing the model context protocol, ” https://www.anthropic.com/news/model-context-protocol, 2024

work page 2024
[19]

AI agent supply chain risk: Silent codebase exfiltration via skills,

Mitiga Labs, “AI agent supply chain risk: Silent codebase exfiltration via skills, ” https://www.mitiga.io/blog/ai-agent- supply-chain-risk-silent-codebase-exfiltration-via-skills, 2026

work page 2026
[20]

OpenClaw’s 230 malicious skills: What agentic AI supply chains teach us about the need to evolve identity security,

Authmind, “OpenClaw’s 230 malicious skills: What agentic AI supply chains teach us about the need to evolve identity security, ” https://www.authmind.com/blogs/openclaw-malicious-skills-agentic-ai-supply-chain, 2026

work page 2026
[21]

Agent skills threat model,

SafeDep Team, “Agent skills threat model, ” https://safedep.io/agent-skills-threat-model, 2026

work page 2026
[22]

Sok: Taxonomy of attacks on open-source software supply chains,

P. Ladisa, H. Plate, M. Martinez, and O. Barais, “Sok: Taxonomy of attacks on open-source software supply chains, ” in 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023, pp. 1509–1526

work page 2023
[23]

Agent skills spreading hallucinated npx commands,

Aikido Security, “Agent skills spreading hallucinated npx commands, ” https://www.aikido.dev/blog/agent-skills- spreading-hallucinated-npx-commands, 2026

work page 2026
[24]

OWASP top 10 for LLM apps & gen AI agentic security initiative,

OWASP GenAI Security Project, “OWASP top 10 for LLM apps & gen AI agentic security initiative, ” https://genai. owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/, 2025

work page 2026
[25]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection, ” inProceedings of the 16th ACM workshop on J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026. 111:26 Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, an...

work page 2026
[26]

Agent skills: Explore security threats and controls,

Red Hat Developer, “Agent skills: Explore security threats and controls, ” https://developers.redhat.com/articles/2026/ 03/10/agent-skills-explore-security-threats-and-controls, 2026

work page 2026
[27]

Skill scanner: Security analysis of AI agent skills,

Cisco AI Defense, “Skill scanner: Security analysis of AI agent skills, ” https://github.com/cisco-ai-defense/skill-scanner, 2026

work page 2026
[28]

Caught in the hook: RCE and API token exfiltration through Claude Code project files,

A. Donenfeld and O. Vanunu, “Caught in the hook: RCE and API token exfiltration through Claude Code project files, ” https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve- 2025-59536/, 2026

work page 2026
[29]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases, ”Advances in Neural Information Processing Systems, vol. 37, pp. 130 185–130 213, 2024

work page 2024
[30]

Lee and A

D. Lee and M. Tiwari, “Prompt infection: Llm-to-llm prompt injection within multi-agent systems, ”arXiv preprint arXiv:2410.07283, 2024

work page arXiv 2024
[31]

Trends and lessons from three years fighting malicious extensions,

N. Jagpal, E. Dingle, J.-P. Gravel, P. Mavrommatis, N. Provos, M. A. Rajab, and K. Thomas, “Trends and lessons from three years fighting malicious extensions, ” in24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 579–593

work page 2015
[32]

{StruQ}: Defending against prompt injection with structured queries,

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “{StruQ}: Defending against prompt injection with structured queries, ” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400

work page 2025
[33]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions, ”arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

A survey on trustworthy llm agents: Threats and countermeasures,

M. Yu, F. Meng, X. Zhou, S. Wang, J. Mao, L. Pan, T. Chen, K. Wang, X. Li, Y. Zhanget al., “A survey on trustworthy llm agents: Threats and countermeasures, ” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 6216–6226

work page 2025
[35]

in-toto: Providing farm-to-table guarantees for bits and bytes,

S. Torres-Arias, H. Afzali, T. K. Kuppusamy, R. Curtmola, and J. Cappos, “in-toto: Providing farm-to-table guarantees for bits and bytes, ” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 1393–1410

work page 2019
[36]

Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills,

Snyk, “Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills, ” https://github.com/snyk/agent- scan, 2026

work page 2026
[37]

Skill scanner: Security scanner for agent skills,

Cisco AI Defense, “Skill scanner: Security scanner for agent skills, ” https://github.com/cisco-ai-defense/skill-scanner, 2026

work page 2026
[38]

Jailbroken: How does llm safety training fail?

A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does llm safety training fail?”Advances in neural information processing systems, vol. 36, pp. 80 079–80 110, 2023

work page 2023
[39]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models, ”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Jailbreaking black box large language models in twenty queries,

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries, ” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2025, pp. 23–42

work page 2025
[41]

Tree of attacks: Jailbreaking black-box llms automatically,

A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box llms automatically, ”Advances in Neural Information Processing Systems, vol. 37, pp. 61 065–61 105, 2024

work page 2024
[42]

Ignore previous prompt: Attack techniques for language models,

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models, ” inNeurIPS ML Safety Workshop, 2022

work page 2022
[43]

Promptlocate: Localizing prompt injection attacks,

Y. Jia, Y. Liu, Z. Shao, J. Jia, and N. Gong, “Promptlocate: Localizing prompt injection attacks, ”arXiv preprint arXiv:2510.12252, 2025

work page arXiv 2025
[44]

Extracting training data from large language models,

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models, ” in30th USENIX security symposium (USENIX Security 21), 2021, pp. 2633–2650

work page 2021
[45]

Hulk: Eliciting malicious behavior in browser extensions,

A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna, and V. Paxson, “Hulk: Eliciting malicious behavior in browser extensions, ” in23rd USENIX Security Symposium (USENIX Security 14), 2014, pp. 641–654

work page 2014
[46]

Developers are victims too: A comprehensive analysis of the vs code extension ecosystem,

S. Edirimannage, C. Elvitigala, A. K. K. Don, W. Daluwatta, P. Wijesekara, and I. Khalil, “Developers are victims too: A comprehensive analysis of the vs code extension ecosystem, ”arXiv preprint arXiv:2411.07479, 2024

work page arXiv 2024
[47]

Backstabber’s knife collection: A review of open source software supply chain attacks,

M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks, ” inInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2020, pp. 23–43

work page 2020
[48]

Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with SUN- BURST backdoor,

Mandiant, “Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with SUN- BURST backdoor, ” https://cloud.google.com/blog/topics/threat-intelligence/evasive-attacker-leverages-solarwinds- supply-chain-compromises-with-sunburst-backdoor, dec 2020

work page 2020
[49]

backdoor in upstream xz/liblzma leading to ssh server compromise,

A. Freund, “backdoor in upstream xz/liblzma leading to ssh server compromise, ” https://www.openwall.com/lists/oss- security/2024/03/29/4, mar 2024

work page 2024
[50]

Dependency confusion: How I hacked into Apple, Microsoft and dozens of other companies,

A. Birsan, “Dependency confusion: How I hacked into Apple, Microsoft and dozens of other companies, ”Medium, 2021

work page 2021
[51]

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents.arXiv preprintarXiv:2411.09523, 2024

Y. Gan, Y. Yang, Z. Ma, P. He, R. Zeng, Y. Wang, Q. Li, C. Zhou, S. Li, T. Wanget al., “Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents, ”arXiv preprint arXiv:2411.09523, 2024. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Secu...

work page arXiv 2024
[52]

The emerged security and privacy of llm agent: A survey with case studies,

F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of llm agent: A survey with case studies, ”ACM Computing Surveys, vol. 58, no. 6, pp. 1–36, 2025

work page 2025
[53]

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

S. Datta, S. K. Nahin, A. Chhabra, and P. Mohapatra, “Agentic ai security: Threats, defenses, evaluation, and open challenges, ”arXiv preprint arXiv:2510.23883, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents,

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents, ” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10 471–10 506

work page 2024
[55]

Formalizing and benchmarking prompt injection attacks and defenses,

Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses, ” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847

work page 2024
[56]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tramèr, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents, ”Advances in Neural Information Processing Systems, vol. 37, pp. 82 895–82 920, 2024

work page 2024
[57]

Watch out for your agents! investigating backdoor threats to llm-based agents,

W. Yang, X. Bi, Y. Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents, ”Advances in Neural Information Processing Systems, vol. 37, pp. 100 938–100 964, 2024. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026

work page 2024