Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
hub Canonical reference
Agent skills: A data-driven analysis of claude skills for extending large language model functionality
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 15roles
background 5representative citing papers
FermiLink is a unified AI agent framework that automates multidomain scientific simulations via separated package knowledge bases and a four-layer progressive disclosure mechanism, reproducing 56% of target figures in benchmarks and generating research-grade results on unpublished problems.
SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.
SearchSkill improves exact match scores and retrieval efficiency on open-domain QA by conditioning LLM actions on skills from an evolving SkillBank updated from failure patterns via two-stage SFT.
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
Public healthcare agent skills emphasize workflow automation over clinical diagnostics and treatments, with uneven lifecycle coverage and weak alignment between technical and clinical risk.
Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.
SciVisAgentSkills provides reusable agent skills that raise mean task scores on a 108-task SciVis benchmark when paired with Codex and Claude Code agents.
Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Analysis of ClawHub shows language-based functional divides in agent skills, with over 30% flagged suspicious and submission-time documentation enabling 73% accurate risk prediction.
citing papers explorer
-
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
-
FermiLink: A Unified Agent Framework for Multidomain Autonomous Scientific Simulations
FermiLink is a unified AI agent framework that automates multidomain scientific simulations via separated package knowledge bases and a four-layer progressive disclosure mechanism, reproducing 56% of target figures in benchmarks and generating research-grade results on unpublished problems.
-
Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning
SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.
-
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks
SearchSkill improves exact match scores and retrieval efficiency on open-domain QA by conditioning LLM actions on skills from an evolving SkillBank updated from failure patterns via two-stage SFT.
-
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
-
An Empirical Study of Agent Skills for Healthcare: Practice, Gaps, and Governance
Public healthcare agent skills emphasize workflow automation over clinical diagnostics and treatments, with uneven lifecycle coverage and weak alignment between technical and clinical risk.
-
Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security
Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.
-
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization
SciVisAgentSkills provides reusable agent skills that raise mean task scores on a 108-task SciVis benchmark when paired with Codex and Claude Code agents.
-
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.
-
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
-
Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub
Analysis of ClawHub shows language-based functional divides in agent skills, with over 30% flagged suspicious and submission-time documentation enabling 73% accurate risk prediction.
- Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents
- SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
- Skill Retrieval Augmentation for Agentic AI