citation dossier

Schmotz, L

D · 2026 · arXiv 2602.20156

16Pith papers citing it

17reference links

cs.CRtop field · 13 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 16 reviewed papers. Its strongest current cluster is cs.CR (13 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

cs.AI · 2026-05-12 · unverdicted · novelty 8.0

Semantic manipulations of SKILL.md descriptions enable effective supply-chain attacks that bias AI agent skill registries toward adversarial skills in discovery, selection, and governance.

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

cs.CR · 2026-04-03 · unverdicted · novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

cs.CR · 2026-04-03 · accept · novelty 8.0

Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.

No More, No Less: Task Alignment in Terminal Agents

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

The TAB benchmark reveals that frontier terminal agents achieve high task completion but low selective alignment with relevant environmental cues over distractors, and prompt-injection defenses block both.

Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

Many-Tier Instruction Hierarchy in LLM Agents

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.

Behavioral Integrity Verification for AI Agent Skills

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

cs.CR · 2026-05-11 · unverdicted · novelty 6.0

DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents

cs.CR · 2026-04-24 · unverdicted · novelty 6.0

RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

cs.CR · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

ClawGuard enforces deterministic, user-derived access constraints at tool boundaries to block indirect prompt injection without changing the underlying LLM.

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

cs.CR · 2026-04-28 · unverdicted · novelty 5.0

SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.

citing papers explorer

Showing 16 of 16 citing papers.

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry cs.AI · 2026-05-12 · unverdicted · none · ref 11
Semantic manipulations of SKILL.md descriptions enable effective supply-chain attacks that bias AI agent skill registries toward adversarial skills in discovery, selection, and governance.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems cs.CR · 2026-04-03 · unverdicted · none · ref 37
DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis cs.CR · 2026-04-03 · accept · none · ref 14
Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
No More, No Less: Task Alignment in Terminal Agents cs.LG · 2026-05-12 · unverdicted · none · ref 14
The TAB benchmark reveals that frontier terminal agents achieve high task completion but low selective alignment with relevant environmental cues over distractors, and prompt-injection defenses block both.
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills cs.CR · 2026-05-10 · unverdicted · none · ref 18
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 33
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
Many-Tier Instruction Hierarchy in LLM Agents cs.CL · 2026-04-10 · unverdicted · none · ref 23
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unverdicted · none · ref 77
SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 27
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw cs.CR · 2026-05-11 · unverdicted · none · ref 11
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks cs.CR · 2026-05-08 · unverdicted · none · ref 15
Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills cs.CR · 2026-05-07 · unverdicted · none · ref 49
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection cs.CR · 2026-05-05 · unverdicted · none · ref 140
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents cs.CR · 2026-04-24 · unverdicted · none · ref 12
RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection cs.CR · 2026-04-13 · unverdicted · none · ref 35 · 2 links
ClawGuard enforces deterministic, user-derived access constraints at tool boundaries to block indirect prompt injection without changing the underlying LLM.
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills cs.CR · 2026-04-28 · unverdicted · none · ref 19
SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.

Schmotz, L

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer