hub Canonical reference

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Haoyu Wang, Christopher M. Poskitt, Jun Sun · 2025 · cs.AI · arXiv 2503.18666

Canonical reference. 73% of citing Pith papers cite this work as background.

52 Pith papers citing it

Background 73% of classified citations

open full Pith review browse 52 citing papers arXiv PDF

abstract

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identify 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 dataset 2

citation-polarity summary

background 8 use dataset 2 unclear 1

representative citing papers

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

cs.CR · 2026-05-04 · unverdicted · novelty 8.0

APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

From Anatomy to Smells: An Empirical Study of SKILL.md in Agent Skills

cs.SE · 2026-07-01 · unverdicted · novelty 7.0

Empirical study of 238 SKILL.md files finds over 99% contain skill smells that rarely disappear, revealing a gap between recommended and actual authoring practices for agent skills.

Bistable by Construction: Wall-Clock-Calibrated State Monitors Have No Moment-Detection Regime at Agent Cadence

cs.SE · 2026-06-15 · conditional · novelty 7.0

Wall-clock leaky-integrator monitors on variable-cadence agent streams exhibit a sharp cliff between constant-alarm and silent regimes, while sample-time CUSUM monitors remain invariant; transition detection avoids the trap.

What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.

Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows

cs.LO · 2026-05-20 · unverdicted · novelty 7.0

Introduces Causal Past Logic (CPL) for source-level guards in distributed LLM workflows and proves a vector-clock monitor matches the denotational semantics of the logic.

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

cs.CR · 2026-05-13 · unverdicted · novelty 7.0

Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

cs.AI · 2026-04-17 · conditional · novelty 7.0

AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.

PowerDAG: Reliable Agentic AI System for Automating Distribution Grid Analysis

eess.SY · 2026-03-18 · unverdicted · novelty 7.0

PowerDAG achieves 94-100% success on unseen distribution grid analysis queries by combining adaptive retrieval with similarity-decay cutoff and just-in-time supervision, outperforming ReAct, LangChain, and CrewAI baselines.

Formal Policy Enforcement for Real-World Agentic Systems

cs.CR · 2026-02-18 · unverdicted · novelty 7.0

FORGE enforces security policies in agentic systems via Datalog over abstract predicates with an observability service and reference monitor that guarantees policy semantics when the environment contract holds.

Efficient Remote KV Cache Reuse with GPU-native Video Codec

cs.DC · 2026-02-10 · conditional · novelty 7.0

KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.

Auditing AI Investment Recommendations as Executable Actions

cs.LO · 2026-06-25 · unverdicted · novelty 6.0

Introduces a protocol scoring AI investment advisors on validity under constraints, stability, and agreement with a deterministic baseline, showing agreement often masks invalid actions.

ActPlane: Programmable OS-Level Policy Enforcement for Agent Harnesses

cs.OS · 2026-06-23 · unverdicted · novelty 6.0 · 2 refs

ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.

AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

cs.SE · 2026-06-23 · unverdicted · novelty 6.0

AutoSpec evolves safety rules for LLM agents via ILP-guided counterexample synthesis, raising F1 to 0.98/0.93 on code and embodied domains with 94% false-positive reduction.

SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

cs.CR · 2026-05-06 · unverdicted · novelty 6.0

SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.

Safeguarding LLM Agents from Misalignment through Provenance Analysis

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

ProvenanceGuard applies a provenance-based framework to detect three types of misalignment in LLM agent tool calls, cutting error rates on misaligned traces from 42.9% to 1.8% on one benchmark while lowering unnecessary interventions.

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.

Alignment Contracts for Agentic Security Systems

cs.CR · 2026-04-30 · conditional · novelty 6.0

Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an observability assumption.

An AI Agent Execution Environment to Safeguard User Data

cs.CR · 2026-04-21 · unverdicted · novelty 6.0

GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack-free models.

Owner-Harm: A Missing Threat Model for AI Agent Safety

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.

PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

cs.CR · 2026-04-11 · unverdicted · novelty 6.0

PlanGuard cuts indirect prompt injection attack success rate to 0% on the InjecAgent benchmark by verifying agent actions against a user-instruction-only plan while keeping false positives at 1.49%.

citing papers explorer

Showing 24 of 24 citing papers after filters.

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks cs.CR · 2026-05-04 · unverdicted · none · ref 31 · internal anchor
APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.
What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents cs.CR · 2026-06-01 · unverdicted · none · ref 19 · internal anchor
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
Do Coding Agents Understand Least-Privilege Authorization? cs.CR · 2026-05-14 · unverdicted · none · ref 39 · internal anchor
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills cs.CR · 2026-05-13 · unverdicted · none · ref 28 · internal anchor
Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
Formal Policy Enforcement for Real-World Agentic Systems cs.CR · 2026-02-18 · unverdicted · none · ref 64 · internal anchor
FORGE enforces security policies in agentic systems via Datalog over abstract predicates with an observability service and reference monitor that guarantees policy semantics when the environment contract holds.
SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response cs.CR · 2026-05-06 · unverdicted · none · ref 10 · internal anchor
SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection cs.CR · 2026-05-05 · unverdicted · none · ref 150 · internal anchor
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis cs.CR · 2026-05-01 · unverdicted · none · ref 41 · internal anchor
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
Alignment Contracts for Agentic Security Systems cs.CR · 2026-04-30 · conditional · full · ref 43 · internal anchor
Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an observability assumption.
An AI Agent Execution Environment to Safeguard User Data cs.CR · 2026-04-21 · unverdicted · none · ref 72 · internal anchor
GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack-free models.
Owner-Harm: A Missing Threat Model for AI Agent Safety cs.CR · 2026-04-20 · unverdicted · none · ref 13 · internal anchor
Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.
PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification cs.CR · 2026-04-11 · unverdicted · none · ref 16 · internal anchor
PlanGuard cuts indirect prompt injection attack success rate to 0% on the InjecAgent benchmark by verifying agent actions against a user-instruction-only plan while keeping false positives at 1.49%.
Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents cs.CR · 2026-06-25 · unverdicted · none · ref 114 · internal anchor
A data-centric survey finds that only information-flow control covers compositional and cross-session leakage in LLM agents and that no single benchmark tests an agent across all its data surfaces under one policy.
VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills cs.CR · 2026-06-25 · unverdicted · none · ref 14 · internal anchor
VIGIL introduces a policy language and symbolic evaluation rules to enforce context-aware behavioral specifications on LLM agent traces, achieving over 95% recall and under 10% false positives on real tasks.
ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense cs.CR · 2026-06-19 · unverdicted · none · ref 48 · internal anchor
ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.
From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents cs.CR · 2026-06-03 · unverdicted · none · ref 22 · 2 links · internal anchor
This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation cs.CR · 2026-05-07 · unverdicted · none · ref 34 · internal anchor
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems cs.CR · 2026-04-24 · unverdicted · none · ref 13 · internal anchor
Sovereign Agentic Loops decouple LLM reasoning from execution by emitting validated intents through a control plane with obfuscation and evidence chains, blocking 93% of unsafe actions in a cloud prototype while adding 12.4 ms latency.
From Production SIEM to Reusable Cybersecurity Artifacts cs.CR · 2026-06-19 · unverdicted · none · ref 35 · internal anchor
Methodology turns private production SIEM logs into reusable, anonymized cybersecurity artifacts validated on 37 ATT&CK-mapped challenges and 200 SOCpilot incidents.
AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent cs.CR · 2026-05-27 · unverdicted · none · ref 18 · internal anchor
AgentGuard is an ABAC framework for tool-use LLM agents with lightweight client integration and three server-side inspection mechanisms for single-tool and cross-tool risks.
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes cs.CR · 2026-05-01 · unverdicted · none · ref 27 · internal anchor
Proposes a trust schema including verification levels and a biconditional correctness criterion to verify skills in human-in-the-loop agent runtimes, reducing the need for constant oversight.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 179 · internal anchor
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI cs.CR · 2026-05-15 · unverdicted · none · ref 134 · internal anchor
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.
How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study cs.CR · 2026-04-03 · unreviewed · ref 57 · internal anchor

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer