"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

Gelei Deng; Jianting Ning; Leo Yu Zhang; Yanjun Zhang; Yi Liu; Yuekang Li; Zhihao Chen

arxiv: 2602.06547 · v4 · pith:DSGGYYAFnew · submitted 2026-02-06 · 💻 cs.CR · cs.AI· cs.CL· cs.ET

"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

Yi Liu , Zhihao Chen , Yanjun Zhang , Gelei Deng , Yuekang Li , Jianting Ning , Leo Yu Zhang This is my paper

classification 💻 cs.CR cs.AIcs.CLcs.ET

keywords skillsattackagentmaliciousanalysisconfirmedemployingidentify

0 comments

read the original abstract

LLM-based coding agents increasingly rely on third-party extensions called skills, which bundle natural language instructions and helper scripts that execute with full user privileges. Community registries have emerged to distribute these skills, but the security implications remain unstudied due to the absence of labeled threat data. This paper presents a systematic security analysis of 98,380 skills collected from two major registries. Through a combination of static pattern matching and dynamic behavioral verification, we identify 157 skills exhibiting confirmed malicious behavior, encompassing 632 distinct vulnerabilities across 13 attack techniques. Our analysis reveals that these threats are deliberate rather than accidental: each malicious skill contains an average of 4.03 vulnerabilities spanning multiple attack phases. We identify two dominant attack strategies with statistically significant negative correlation -- credential theft via remote code execution, and agent manipulation through adversarial instructions embedded in documentation. Over half of all confirmed cases originate from a single threat actor employing templated brand impersonation at scale. We further observe that attack sophistication correlates with concealment investment, with advanced skills universally employing undocumented capabilities while also exploiting platform-native trust mechanisms. Following responsible disclosure, registry maintainers removed all 157 (100%) of the reported skills. Our dataset and detection pipeline are publicly available to facilitate future research on securing LLM agent ecosystems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 32 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
cs.AI 2026-05 unverdicted novelty 8.0

Semantic manipulations of SKILL.md descriptions enable effective supply-chain attacks that bias AI agent skill registries toward adversarial skills in discovery, selection, and governance.
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
cs.CR 2026-04 unverdicted novelty 8.0

Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
cs.CR 2026-04 accept novelty 8.0

Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware
cs.CR 2026-07 unverdicted novelty 7.0

SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.
AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs
cs.SE 2026-07 unverdicted novelty 7.0

AgentFlow builds a framework-agnostic Agent Dependency Graph from agent program source code to support static analyses such as BOM generation and prompt-to-tool risk detection, evaluated on 5,399 real programs across ...
Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains
cs.SE 2026-07 unverdicted novelty 7.0

The paper defines Agent Skill Supply Chains (ASSCs) and SkillDepAnalyzer to extract and analyze dependency graphs from over 1.43 million LLM agent skills, revealing structural patterns and security signals.
Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network
cs.AI 2026-05 unverdicted novelty 7.0

Empirical study of EvoMap shows 98% of assets never reused, scores driven by self-reported metadata, and 84% of assets using vacuous validation tests.
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
cs.CR 2026-05 unverdicted novelty 7.0

Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
Do Skill Descriptions Tell the Truth? Detecting Undisclosed Security Behaviors in Code-Backed LLM Skills
cs.CR 2026-05 conditional novelty 7.0

SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated ski...
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
cs.CR 2026-05 unverdicted novelty 7.0

SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.
Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems
cs.CR 2026-05 unverdicted novelty 7.0

Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills
cs.CR 2026-05 unverdicted novelty 7.0

Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
Sealing the Audit-Runtime Gap for LLM Skills
cs.CR 2026-05 unverdicted novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study
cs.CR 2026-04 accept novelty 7.0

Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.
The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills
cs.CR 2026-06 unverdicted novelty 6.0

A per-component SimHash fingerprint supplies structural identity for AI agent skills, recovering family membership under paraphrase and refactoring with AUC 0.974 while localizing changes.
Detecting Malicious Agent Skills in the Wild using Attention
cs.CR 2026-06 unverdicted novelty 6.0

Locate-and-Judge uses attention-based span scoring followed by targeted LLM judgment to detect malicious third-party skills for LLM agents, achieving order-of-magnitude cost savings and surfacing live threats in marketplaces.
Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security
cs.CR 2026-06 unverdicted novelty 6.0

Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.
POISE: Position-Aware Undetectable Skill Injection on LLM Agents
cs.CR 2026-06 unverdicted novelty 6.0

POISE is a stealthy skill-poisoning attack achieving 89.3% ASR on Skill-Inject by blending a compressed trigger into contextually appropriate positions in skill bodies, outperforming YAML and random-placement baseline...
SkillGuard: A Permission Framework for Agent Skills
cs.CR 2026-06 unverdicted novelty 6.0

SkillGuard presents a dual-plane permission framework for agent skills that achieves 99.76% taxonomy coverage and reduces attack success rates in evaluations on 315 skills.
When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems
cs.SE 2026-05 unverdicted novelty 6.0

About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.
Harnessing Agent Skills: Architectural Patterns and a Reference Architecture for Skill-Mediated LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

Catalogs ten patterns and synthesizes a four-layer reference architecture for skill harnessing in LLM agents, evaluated via cross-instantiation on eight systems.
Exploiting LLM Agent Supply Chains via Payload-less Skills
cs.CR 2026-05 conditional novelty 6.0

Semantic Compliance Hijacking lets attackers hijack LLM agents by disguising malicious instructions as compliance rules in skills, reaching up to 77.67% success on confidentiality breaches and 67.33% on RCE while evad...
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
cs.CR 2026-05 unverdicted novelty 6.0

SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
Behavioral Integrity Verification for AI Agent Skills
cs.CR 2026-05 unverdicted novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills
cs.CR 2026-05 unverdicted novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task c...
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents
cs.CR 2026-04 unverdicted novelty 6.0

RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills
cs.CR 2026-04 unverdicted novelty 6.0

SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at ...
VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills
cs.CR 2026-06 unverdicted novelty 5.0

VIGIL introduces a policy language and symbolic evaluation rules to enforce context-aware behavioral specifications on LLM agent traces, achieving over 95% recall and under 10% false positives on real tasks.
Pomona: Continuous Code Quality Improvement via Small, Automated Changes at Bloomberg
cs.SE 2026-06 conditional novelty 5.0

Pomona automates discovery and repair of small code quality issues via agent skills, achieving 15 of 17 PRs merged with median close time under 2 hours in a one-month Bloomberg team deployment.
ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree
cs.CR 2026-05 accept novelty 5.0

Analysis of 67,453 OpenClaw skills shows three scanners overlap on at most 10.4% of combined positives, with 81.9% flagged by only one scanner and distinct profiles for malicious versus suspicious skills.
Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems
cs.CR 2026-05 unverdicted novelty 5.0

SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experimen...
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
cs.MA 2026-02 unverdicted novelty 4.0

The paper surveys agent skills for LLMs across architecture, acquisition, deployment, and security, proposing a four-tier Skill Trust and Lifecycle Governance Framework to address vulnerabilities in community skills.