APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.
hub Canonical reference
AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents
Canonical reference. 73% of citing Pith papers cite this work as background.
abstract
Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identify 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
Empirical study of 238 SKILL.md files finds over 99% contain skill smells that rarely disappear, revealing a gap between recommended and actual authoring practices for agent skills.
Wall-clock leaky-integrator monitors on variable-cadence agent streams exhibit a sharp cliff between constant-alarm and silent regimes, while sample-time CUSUM monitors remain invariant; transition detection avoids the trap.
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
Introduces Causal Past Logic (CPL) for source-level guards in distributed LLM workflows and proves a vector-clock monitor matches the denotational semantics of the logic.
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
PowerDAG achieves 94-100% success on unseen distribution grid analysis queries by combining adaptive retrieval with similarity-decay cutoff and just-in-time supervision, outperforming ReAct, LangChain, and CrewAI baselines.
FORGE enforces security policies in agentic systems via Datalog over abstract predicates with an observability service and reference monitor that guarantees policy semantics when the environment contract holds.
KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.
Introduces a protocol scoring AI investment advisors on validity under constraints, stability, and agreement with a deterministic baseline, showing agreement often masks invalid actions.
ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.
SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an observability assumption.
GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack-free models.
Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.
PlanGuard cuts indirect prompt injection attack success rate to 0% on the InjecAgent benchmark by verifying agent actions against a user-instruction-only plan while keeping false positives at 1.49%.
No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms for detect/enforce/recover.
Independent evaluation of Claude Code auto mode finds 81% false negative rate on ambiguous authorization tasks due to unmonitored file edits.
citing papers explorer
-
From Anatomy to Smells: An Empirical Study of SKILL.md in Agent Skills
Empirical study of 238 SKILL.md files finds over 99% contain skill smells that rarely disappear, revealing a gap between recommended and actual authoring practices for agent skills.
-
Bistable by Construction: Wall-Clock-Calibrated State Monitors Have No Moment-Detection Regime at Agent Cadence
Wall-clock leaky-integrator monitors on variable-cadence agent streams exhibit a sharp cliff between constant-alarm and silent regimes, while sample-time CUSUM monitors remain invariant; transition detection avoids the trap.
-
Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode
Independent evaluation of Claude Code auto mode finds 81% false negative rate on ambiguous authorization tasks due to unmonitored file edits.
-
ATM: CID-Brokered Pre-Write Admission for Multi-Agent Code Co-Synthesis
ATM is a CID-brokered governance framework that maps write intents to semantic atoms for pre-admission control, validation, and neutral-steward application in single-domain multi-agent code synthesis.
-
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
-
Spec Kit Agents: Context-Grounded Agentic Workflows
A multi-agent SDD framework with phase-level context-grounding hooks improves LLM-judged quality by 0.15 points and SWE-bench Lite Pass@1 by 1.7 percent while preserving near-perfect test compatibility.
-
Toward Self-Evolution-Ready Workflow Harnesses: A Reversible Migration Path and Convertibility Taxonomy for Expert LLM Pipelines
Introduces a migration framework and A/B/C taxonomy to convert static expert LLM workflows into self-evolution-ready harnesses.