HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

· 2026 · cs.CR · arXiv 2604.15415

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Large language models (LLMs) have evolved into autonomous agents that rely on open skill ecosystems (e.g., ClawHub and Skills.Rest), hosting numerous publicly reusable skills. Existing security research on these ecosystems mainly focuses on vulnerabilities within skills, such as prompt injection. However, there is a critical gap regarding skills that may be misused for harmful actions (e.g., cyber attacks, fraud and scams, privacy violations, and sexual content generation), namely harmful skills. In this paper, we present the first large-scale measurement study of harmful skills in agent ecosystems, covering 98,440 skills across two major registries. Using an LLM-driven scoring system grounded in our harmful skill taxonomy, we find that 4.93% of skills (4,858) are harmful, with ClawHub exhibiting an 8.84% harmful rate compared to 3.49% on Skills.Rest. We then construct HarmfulSkillBench, the first benchmark for evaluating agent safety against harmful skills in realistic agent contexts, comprising 200 harmful skills across 20 categories and four evaluation conditions. By evaluating six LLMs on HarmfulSkillBench, we find that presenting a harmful task through a pre-installed skill substantially lowers refusal rates across all models, with the average harm score rising from 0.27 without the skill to 0.47 with it, and further to 0.76 when the harmful intent is implicit rather than stated as an explicit user request. We responsibly disclose our findings to the affected registries and release our benchmark to support future research (see https://github.com/TrustAIRLab/HarmfulSkillBench).

representative citing papers

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

SkillHarm benchmark shows current AI agents are vulnerable to lifecycle-aware skill poisoning with success rates up to 86.3% for fixed-payload attacks and 69.3% for self-mutating attacks.

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

SKIM is an adaptive multi-resolution soft-token framework that compresses procedural skills while aiming to preserve logical dependencies and task performance better than prior compression methods.

VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills

cs.CR · 2026-06-25 · unverdicted · novelty 5.0

VIGIL introduces a policy language and symbolic evaluation rules to enforce context-aware behavioral specifications on LLM agent traces, achieving over 95% recall and under 10% false positives on real tasks.

citing papers explorer

Showing 4 of 4 citing papers.

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction cs.CL · 2026-06-01 · unverdicted · none · ref 41 · internal anchor
SkillHarm benchmark shows current AI agents are vulnerable to lifecycle-aware skill poisoning with success rates up to 86.3% for fixed-payload attacks and 69.3% for self-mutating attacks.
Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 20 · internal anchor
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models cs.CL · 2026-06-10 · unverdicted · none · ref 51 · internal anchor
SKIM is an adaptive multi-resolution soft-token framework that compresses procedural skills while aiming to preserve logical dependencies and task performance better than prior compression methods.
VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills cs.CR · 2026-06-25 · unverdicted · none · ref 62 · internal anchor
VIGIL introduces a policy language and symbolic evaluation rules to enforce context-aware behavioral specifications on LLM agent traces, achieving over 95% recall and under 10% false positives on real tasks.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

fields

years

verdicts

representative citing papers

citing papers explorer