hub Canonical reference

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang · 2026 · cs.CR · arXiv 2602.20867

Canonical reference. 100% of citing Pith papers cite this work as background.

33 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 33 citing papers arXiv PDF

abstract

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10

citation-polarity summary

background 10

representative citing papers

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.

Five Attacks on x402 Agentic Payment Protocol

cs.CR · 2026-05-12 · conditional · novelty 7.0

Five practical attacks on the x402 agentic payment protocol are demonstrated across authorization, binding, replay protection, and web handling, validated on local chains, Base Sepolia, live endpoints, and three open-source SDKs.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries

cs.SE · 2026-05-09 · conditional · novelty 7.0

SkillGuard extracts executable environment contracts from LLM skill documents to detect only relevant drifts, reporting zero false positives on 599 cases, 100% precision in known-drift tests, and raising one-round repair success from 10% to 78%.

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

Uncertainty Propagation in LLM-Based Systems

cs.SE · 2026-04-26 · unverdicted · novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.

Knows: Agent-Native Structured Research Representations

cs.AI · 2026-04-19 · conditional · novelty 7.0

Knows uses a YAML sidecar specification to provide structured, agent-consumable representations of research papers, yielding large accuracy gains for small LLMs on comprehension tasks and rapid community adoption via a public hub.

SoK: Blockchain Agent-to-Agent Payments

q-fin.GN · 2026-04-04 · unverdicted · novelty 7.0

The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.

SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

SciVisAgentSkills provides reusable agent skills that raise mean task scores on a 108-task SciVis benchmark when paired with Codex and Claude Code agents.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

cs.CR · 2026-05-21 · unverdicted · novelty 6.0

A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.

The Scaling Laws of Skills in LLM Agent Systems

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

Empirical analysis across 15 LLMs and 1,141 skills identifies a logarithmic routing decay law and a multiplicative execution law coupled by a single fitted slope parameter b that enables targeted library optimizations improving routing accuracy and downstream task pass rates.

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

SCOPE maintains semantic commitments via structured specifications and conditional skill orchestration, achieving 0.60 EGIP on the new Gen-Arena benchmark while outperforming baselines on WISE-V and MindBench.

Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.

SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.

Unsupervised Skill Discovery for Agentic Data Analysis

cs.AI · 2026-06-04 · unverdicted · novelty 5.0

DataCOPE uses verifier-guided contrastive distillation from agent trajectories to discover skills, yielding average gains of 9.71% on report-style and 32.30% on reasoning-style data analysis tasks across four model settings.

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

cs.CR · 2026-05-30 · unverdicted · novelty 5.0

SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

cs.CL · 2026-05-18 · unverdicted · novelty 5.0

SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.

Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Ace-Skill boosts multimodal agent self-evolution via prioritized rollouts with lazy-decay tracking and semantic knowledge clustering, yielding up to 35% relative gains on tool-use benchmarks and zero-shot transfer to smaller models.

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0 · 3 refs

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Responsible Agentic AI Requires Explicit Provenance

cs.AI · 2026-05-16 · unverdicted · novelty 4.0

Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.

citing papers explorer

Showing 33 of 33 citing papers.

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning cs.CL · 2026-05-30 · unverdicted · none · ref 30 · internal anchor
SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.
Five Attacks on x402 Agentic Payment Protocol cs.CR · 2026-05-12 · conditional · none · ref 35 · internal anchor
Five practical attacks on the x402 agentic payment protocol are demonstrated across authorization, binding, replay protection, and web handling, validated on local chains, Base Sepolia, live endpoints, and three open-source SDKs.
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 34 · internal anchor
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries cs.SE · 2026-05-09 · conditional · none · ref 13 · internal anchor
SkillGuard extracts executable environment contracts from LLM skill documents to detect only relevant drifts, reporting zero false positives on 599 cases, 100% precision in known-drift tests, and raising one-round repair success from 10% to 78%.
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck cs.LG · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents cs.AI · 2026-05-07 · unverdicted · none · ref 12 · internal anchor
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 19 · internal anchor
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
Uncertainty Propagation in LLM-Based Systems cs.SE · 2026-04-26 · unverdicted · none · ref 8 · internal anchor
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.
Knows: Agent-Native Structured Research Representations cs.AI · 2026-04-19 · conditional · none · ref 15 · internal anchor
Knows uses a YAML sidecar specification to provide structured, agent-consumable representations of research papers, yielding large accuracy gains for small LLMs on comprehension tasks and rapid community adoption via a public hub.
SoK: Blockchain Agent-to-Agent Payments q-fin.GN · 2026-04-04 · unverdicted · none · ref 6 · internal anchor
The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization cs.AI · 2026-06-04 · unverdicted · none · ref 23 · internal anchor
SciVisAgentSkills provides reusable agent skills that raise mean task scores on a 108-task SciVis benchmark when paired with Codex and Claude Code agents.
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill cs.LG · 2026-06-02 · unverdicted · none · ref 41 · internal anchor
Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.
Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions cs.CR · 2026-05-21 · unverdicted · none · ref 7 · internal anchor
A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.
The Scaling Laws of Skills in LLM Agent Systems cs.CL · 2026-05-15 · unverdicted · none · ref 18 · internal anchor
Empirical analysis across 15 LLMs and 1,141 skills identifies a logarithmic routing decay law and a multiplicative execution law coupled by a single fitted slope parameter b that enables targeted library optimizations improving routing accuracy and downstream task pass rates.
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation cs.CV · 2026-05-08 · unverdicted · none · ref 36 · internal anchor
SCOPE maintains semantic commitments via structured specifications and conditional skill orchestration, achieving 0.60 EGIP on the new Gen-Arena benchmark while outperforming baselines on WISE-V and MindBench.
Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries cs.CL · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.
SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology cs.AI · 2026-04-19 · unverdicted · none · ref 18 · internal anchor
SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.
Unsupervised Skill Discovery for Agentic Data Analysis cs.AI · 2026-06-04 · unverdicted · none · ref 17 · internal anchor
DataCOPE uses verifier-guided contrastive distillation from agent trajectories to discover skills, yielding average gains of 9.71% on report-style and 32.30% on reasoning-style data analysis tasks across four model settings.
Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems cs.CR · 2026-05-30 · unverdicted · none · ref 26 · internal anchor
SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution cs.CL · 2026-05-18 · unverdicted · none · ref 19 · internal anchor
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution cs.AI · 2026-05-09 · unverdicted · none · ref 24 · internal anchor
Ace-Skill boosts multimodal agent self-evolution via prioritized rollouts with lazy-decay tracking and semantic knowledge clustering, yielding up to 35% relative gains on tool-use benchmarks and zero-shot transfer to smaller models.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 32 · 3 links · internal anchor
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 63 · internal anchor
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Responsible Agentic AI Requires Explicit Provenance cs.AI · 2026-05-16 · unverdicted · none · ref 31 · internal anchor
Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.
ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation cs.AI · 2026-05-13 · unverdicted · none · ref 10 · 2 links · internal anchor
ChromaFlow reports a negative ablation in which expanded orchestration on GAIA Level-1 tasks reduced accuracy and increased tracebacks, timeouts, and token costs.
SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications cs.AI · 2026-04-14 · unverdicted · none · ref 21 · internal anchor
SciFi is a safe, lightweight agentic AI framework that automates structured scientific tasks with minimal human intervention via isolated environments and layered self-assessing agents.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 79 · internal anchor
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
SkillOpt: Executive Strategy for Self-Evolving Agent Skills cs.AI · 2026-05-22 · unreviewed · ref 8 · internal anchor
When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity cs.AI · 2026-05-19 · unreviewed · ref 7 · internal anchor
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unreviewed · ref 61 · internal anchor
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unreviewed · ref 70 · 2 links · internal anchor
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents cs.AI · 2026-04-17 · unreviewed · ref 10 · internal anchor
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses cs.CR · 2026-03-28 · unreviewed · ref 162 · internal anchor

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer