pith. sign in

hub Canonical reference

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Canonical reference. 100% of citing Pith papers cite this work as background.

44 Pith papers citing it
Background 100% of classified citations
abstract

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

hub tools

citation-role summary

background 10

citation-polarity summary

years

2026 44

roles

background 9

polarities

background 9

clear filters

representative citing papers

Knowledge Over Parameters: Evolving Smart Contract Vulnerability Detection

cs.CR · 2026-07-02 · unverdicted · novelty 7.0

EvoVuln evolves executable detection policies for five smart-contract vulnerability types using cold-start synthetic testing followed by few-shot refinement on five vulnerable and five safe contracts, reaching 71% macro F1 and enabling a small model to beat a large zero-shot model by 19 points at un

Generative Skill Composition for LLM Agents

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

SkillComposer performs task-conditioned skill sequence prediction with a constrained autoregressive decoder to jointly output skill subset, count, and order, raising pass rates by 23.1 and 18.2 percentage points on two production coding agents over no-skill baselines.

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

Five Attacks on x402 Agentic Payment Protocol

cs.CR · 2026-05-12 · conditional · novelty 7.0

Five practical attacks on the x402 agentic payment protocol are demonstrated across authorization, binding, replay protection, and web handling, validated on local chains, Base Sepolia, live endpoints, and three open-source SDKs.

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

Uncertainty Propagation in LLM-Based Systems

cs.SE · 2026-04-26 · unverdicted · novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.

Knows: Agent-Native Structured Research Representations

cs.AI · 2026-04-19 · conditional · novelty 7.0

Knows uses a YAML sidecar specification to provide structured, agent-consumable representations of research papers, yielding large accuracy gains for small LLMs on comprehension tasks and rapid community adoption via a public hub.

SoK: Blockchain Agent-to-Agent Payments

q-fin.GN · 2026-04-04 · unverdicted · novelty 7.0

The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.

SkillGuard: A Permission Framework for Agent Skills

cs.CR · 2026-06-02 · unverdicted · novelty 6.0

SkillGuard presents a dual-plane permission framework for agent skills that achieves 99.76% taxonomy coverage and reduces attack success rates in evaluations on 315 skills.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

cs.AI · 2026-05-22 · unverdicted · novelty 6.0 · 2 refs

SkillOpt introduces a controllable text-space optimizer that evolves agent skills via add/delete/replace edits accepted only on strict held-out validation improvement, reporting consistent gains across 52 model-benchmark-harness combinations.

citing papers explorer

Showing 10 of 10 citing papers after filters.

  • MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills cs.CR · 2026-06-05 · unverdicted · none · ref 25 · internal anchor

    MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.

  • Knowledge Over Parameters: Evolving Smart Contract Vulnerability Detection cs.CR · 2026-07-02 · unverdicted · none · ref 9 · internal anchor

    EvoVuln evolves executable detection policies for five smart-contract vulnerability types using cold-start synthetic testing followed by few-shot refinement on five vulnerable and five safe contracts, reaching 71% macro F1 and enabling a small model to beat a large zero-shot model by 19 points at un

  • SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unverdicted · none · ref 15 · 2 links · internal anchor

    SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.

  • Five Attacks on x402 Agentic Payment Protocol cs.CR · 2026-05-12 · conditional · none · ref 35 · internal anchor

    Five practical attacks on the x402 agentic payment protocol are demonstrated across authorization, binding, replay protection, and web handling, validated on local chains, Base Sepolia, live endpoints, and three open-source SDKs.

  • Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 19 · internal anchor

    SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

  • SkillGuard: A Permission Framework for Agent Skills cs.CR · 2026-06-02 · unverdicted · none · ref 15 · internal anchor

    SkillGuard presents a dual-plane permission framework for agent skills that achieves 99.76% taxonomy coverage and reduces attack success rates in evaluations on 315 skills.

  • Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions cs.CR · 2026-05-21 · unverdicted · none · ref 7 · internal anchor

    A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.

  • Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems cs.CR · 2026-05-30 · unverdicted · none · ref 26 · internal anchor

    SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.

  • Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 79 · internal anchor

    A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

  • Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses cs.CR · 2026-03-28 · unreviewed · ref 162 · internal anchor