Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
hub
SkillProbe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 10representative citing papers
Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at 0.006 cost per skill.
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
citing papers explorer
-
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
-
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
-
Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems
Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.
-
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
-
Sealing the Audit-Runtime Gap for LLM Skills
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
-
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
-
Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents
Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.
-
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
-
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills
SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at 0.006 cost per skill.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.