hub

Agent skills in the wild: An empirical study of security vulnerabilities at scale

Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, Leo Zhang · 2026 · arXiv 2601.10338

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

cs.AI · 2026-05-12 · unverdicted · novelty 8.0

Semantic manipulations of SKILL.md descriptions enable effective supply-chain attacks that bias AI agent skill registries toward adversarial skills in discovery, selection, and governance.

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

cs.CR · 2026-04-09 · unverdicted · novelty 8.0

Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

cs.CR · 2026-04-03 · unverdicted · novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

cs.CR · 2026-04-03 · accept · novelty 8.0

Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.

Do Skill Descriptions Tell the Truth? Detecting Undisclosed Security Behaviors in Code-Backed LLM Skills

cs.CR · 2026-05-13 · conditional · novelty 7.0

SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated skills with 84.8% precision and 96.5% recall against human review.

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

cs.CR · 2026-04-03 · accept · novelty 7.0

Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.

SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

SearchSkill introduces an evolving SkillBank and two-stage SFT to make LLM search query planning explicit via skill selection, improving exact match on QA benchmarks and retrieval behavior.

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.

Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

SBD is a bilevel optimization framework that learns context-dependent safety weights for runtime task delegation in hierarchical multi-agent systems, with continuous authority transfer alpha and theoretical guarantees on safety monotonicity, policy convergence, and accountability propagation.

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

cs.CR · 2026-04-27 · conditional · novelty 6.0

AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.

RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents

cs.CR · 2026-04-24 · unverdicted · novelty 6.0

RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

cs.CR · 2026-04-08 · unverdicted · novelty 6.0

SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at 0.006 cost per skill.

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

cs.CR · 2026-04-06 · conditional · novelty 6.0

Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

cs.CR · 2026-03-28 · unverdicted · novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

cs.AI · 2026-04-17 · unverdicted · novelty 5.0

Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs

cs.AI · 2026-04-17 · unverdicted · novelty 4.0

MESA-S framework translates human metacognitive control into LLMs via delayed procedural probes and Metacognitive Skill Cards to separate parametric certainty from source trust and reduce overthinking.

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

cs.MA · 2026-02-12 · unverdicted · novelty 4.0

The paper surveys agent skills for LLMs across architecture, acquisition, deployment, and security, proposing a four-tier Skill Trust and Lifecycle Governance Framework to address vulnerabilities in community skills.

citing papers explorer

Showing 20 of 20 citing papers after filters.

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry cs.AI · 2026-05-12 · unverdicted · none · ref 10
Semantic manipulations of SKILL.md descriptions enable effective supply-chain attacks that bias AI agent skill registries toward adversarial skills in discovery, selection, and governance.
Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain cs.CR · 2026-04-09 · unverdicted · none · ref 26
Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems cs.CR · 2026-04-03 · unverdicted · none · ref 23
DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis cs.CR · 2026-04-03 · accept · none · ref 12
Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
Do Skill Descriptions Tell the Truth? Detecting Undisclosed Security Behaviors in Code-Backed LLM Skills cs.CR · 2026-05-13 · conditional · none · ref 14
SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated skills with 84.8% precision and 96.5% recall against human review.
Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 29
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study cs.CR · 2026-04-03 · accept · none · ref 33
Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unverdicted · none · ref 70
SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks cs.AI · 2026-05-09 · unverdicted · none · ref 20
SearchSkill introduces an evolving SkillBank and two-stage SFT to make LLM search query planning explicit via skill selection, improving exact match on QA benchmarks and retrieval behavior.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills cs.CR · 2026-05-07 · unverdicted · none · ref 34
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems cs.AI · 2026-04-30 · unverdicted · none · ref 8
SBD is a bilevel optimization framework that learns context-dependent safety weights for runtime task delegation in hierarchical multi-agent systems, with continuous authority transfer alpha and theoretical guarantees on safety monotonicity, policy convergence, and accountability propagation.
AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents cs.CR · 2026-04-27 · conditional · none · ref 10
AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents cs.CR · 2026-04-24 · unverdicted · none · ref 9
RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills cs.CR · 2026-04-08 · unverdicted · none · ref 6
SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at 0.006 cost per skill.
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw cs.CR · 2026-04-06 · conditional · none · ref 9
Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses cs.CR · 2026-03-28 · unverdicted · none · ref 237
The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
Bilevel Optimization of Agent Skills via Monte Carlo Tree Search cs.AI · 2026-04-17 · unverdicted · none · ref 18
Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 92
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs cs.AI · 2026-04-17 · unverdicted · none · ref 6
MESA-S framework translates human metacognitive control into LLMs via delayed procedural probes and Metacognitive Skill Cards to separate parametric certainty from source trust and reduce overthinking.
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward cs.MA · 2026-02-12 · unverdicted · none · ref 33
The paper surveys agent skills for LLMs across architecture, acquisition, deployment, and security, proposing a four-tier Skill Trust and Lifecycle Governance Framework to address vulnerabilities in community skills.

Agent skills in the wild: An empirical study of security vulnerabilities at scale

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer