pith. machine review for the scientific record. sign in

arxiv: 2602.16708 · v3 · submitted 2026-02-18 · 💻 cs.CR · cs.AI· cs.MA

Recognition: no theorem link

Formal Policy Enforcement for Real-World Agentic Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:05 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.MA
keywords policy enforcementagentic systemsDatalogreference monitorassume-guarantee contractsmulti-agent systemssecurity policiesaspect-oriented programming
0
0 comments X

The pith

Policy enforcement in agentic systems can be made formally correct by separating Datalog rules from agent reasoning and checking them against an observability contract.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats security policies as a cross-cutting concern rather than embedding them in agent prompts. It specifies policies declaratively in Datalog over abstract predicates that describe execution context. An observability service maintains those predicates under a formal assume/guarantee contract, and a reference monitor consults the policy at each relevant decision point. When the contract holds, the monitor's decisions match the intended policy semantics exactly. This approach supports static analyses for policy properties and works without modifying the underlying agents.

Core claim

By grounding enforcement in aspect-oriented programming, policies can be written independently of any agent's reasoning in Datalog over abstract predicates. An observability service governed by an assume/guarantee contract maintains these predicates, and a reference monitor produces an enforcement decision at every policy-relevant action. When the environment contract holds, these decisions coincide with the policy's intended semantics.

What carries the argument

Datalog policies over abstract predicates, maintained by an observability service under an assume/guarantee contract and consulted by a reference monitor at each action.

If this is right

  • Policies written in Datalog support recursion and admit static analyses that detect contradictions, redundancy, and conditional reachability.
  • Enforcement decisions remain independent of the agent's internal reasoning and require no changes to the agent code.
  • The same policy mechanism applies across information-flow controls, multi-agent approval workflows, and organizational rules.
  • Natural-language policy ambiguities can be surfaced through the static analyses before deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing agent deployments can adopt the framework by adding the observability service and reference monitor as middleware.
  • Policy maintenance becomes separable from agent updates, allowing organizations to revise rules without retraining models.
  • In settings where the observability contract cannot be guaranteed, the formal match between decisions and semantics would no longer hold.

Load-bearing premise

The observability service correctly maintains the abstract predicates describing execution context.

What would settle it

A concrete execution trace in which the assume/guarantee contract on the observability service holds, yet the reference monitor produces a decision that differs from the Datalog policy evaluation on the maintained predicates.

Figures

Figures reproduced from arXiv: 2602.16708 by Guy Amir, Jihye Choi, Nils Palumbo, Prasad Chalasani, Sarthak Choudhary, Somesh Jha.

Figure 1
Figure 1. Figure 1: PCAS overview. (a) The instrumentation layer intercepts messages, tool calls, and HTTP requests. The reference monitor queries the policy engine, passing the action, identity, and roles, which evaluates Datalog rules against the dependency graph from the observability service. (b) Authorized actions are executed and recorded; denied actions return structured feedback to the agent. (TCB) [24]. Our security … view at source ↗
Figure 2
Figure 2. Figure 2: Information flow policies for prompt injection defense. (a) Bell-LaPadula MLS enforces clearance-based [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Policy enforcement example from the τ 2 -bench airline domain. Left: A customer requests flight cancellation due to an accidental booking. The agent attempts to call cancel_reservation, but PCAS blocks the action based on the policy. After receiving structured feedback, the agent recovers by offering compliant alternatives. Middle: The natural language policy specifying covered cancellation reasons is tran… view at source ↗
Figure 4
Figure 4. Figure 4: Task success rates (pass^k) on τ 2 -bench tasks. The pass^k metric measures the probability that k randomly sampled trials all succeed, capturing consistency rather than single-shot performance. PCAS-instrumented agents (solid lines) consistently outperform non-instrumented baselines (dashed lines) across all values of k, demonstrating that enforcement improves compliance without degrading task completion.… view at source ↗
read the original abstract

Security policy enforcement in contemporary agentic systems predominantly consists of embedding natural-language policies within an agent's system prompt and delegating compliance to the agent's reasoning. This approach admits no formal enforcement guarantee and cannot express policies whose satisfaction depends on the causal history of an execution, a gap that becomes acute in multi-agent systems, where enforcement must reason across agents. We argue that policy enforcement in agentic systems is most naturally understood as a cross-cutting concern, and propose a framework grounded in aspect-oriented programming that specifies policies independent of the agent's reasoning and enforces them at every policy-relevant decision. Policies are written in Datalog over a set of abstract predicates describing the execution context, an observability service governed by a formal assume/guarantee contract maintains these predicates, and a reference monitor consults the policy at each action to produce an enforcement decision. When the environment contract holds, enforcement decisions coincide with the policy's intended semantics. We adopt Datalog as the policy language, a natural fit because it supports declarative rule specification, admits recursion for policies over transitive relationships, and yields deterministic enforcement. Datalog further admits tractable static analyses for contradiction, redundancy, subsumption, and conditional reachability, enabling authors to verify policy intent and surface ambiguities inherent in natural-language specifications. We realize the framework in FORGE, which enforces policies over agentic deployments without modification to the underlying agents. We evaluate FORGE on three case studies: information flow policies for prompt injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that policy enforcement in agentic systems should be treated as a cross-cutting concern via an aspect-oriented framework. Policies are expressed declaratively in Datalog over abstract predicates on execution context; an observability service maintains these predicates under a formal assume/guarantee contract; and a reference monitor consults the policy at each action to produce enforcement decisions. When the contract holds, decisions coincide with intended semantics. The approach is realized in the FORGE system, which requires no agent modification, and is evaluated on three case studies: information-flow policies for prompt-injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service.

Significance. If the observability contract can be shown to hold for unmodified LLM agents, the work would supply a concrete mechanism for history-dependent and multi-agent policy enforcement that current prompt-embedding methods cannot provide. The choice of Datalog is a strength: it enables static analyses for contradiction, redundancy, and reachability, allowing authors to verify policy intent independently of runtime behavior. The no-modification deployment model is practically valuable for real-world agentic systems.

major comments (2)
  1. [Abstract] Abstract: The load-bearing claim that 'When the environment contract holds, enforcement decisions coincide with the policy's intended semantics' is stated without any derivation, model-checking result, or empirical measurement showing that the assume/guarantee contract is satisfied by the actual FORGE observability service when agents are unmodified, non-deterministic LLMs whose internal state is opaque. This leaves the central guarantee as an unverified premise rather than a derived property.
  2. [Abstract] The manuscript introduces the reference monitor and observability service but supplies no concrete description of how causal history across agents is maintained in the Datalog predicates or how the contract is discharged for the three evaluated case studies. Without this, it is impossible to assess whether the framework's guarantees survive the non-determinism and prompt opacity of real LLM agents.
minor comments (1)
  1. [Abstract] The abstract would benefit from a single sentence clarifying the scope of the assume/guarantee contract (e.g., which predicates are assumed versus guaranteed and under what failure model).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on the formal guarantees and practical applicability of FORGE. The feedback highlights the need for clearer exposition of the assume/guarantee contract and its realization in the case studies. We have revised the abstract, added a derivation sketch, and expanded the description of the observability service and case studies to address these points directly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The load-bearing claim that 'When the environment contract holds, enforcement decisions coincide with the policy's intended semantics' is stated without any derivation, model-checking result, or empirical measurement showing that the assume/guarantee contract is satisfied by the actual FORGE observability service when agents are unmodified, non-deterministic LLMs whose internal state is opaque. This leaves the central guarantee as an unverified premise rather than a derived property.

    Authors: We agree that the central claim is conditional and have strengthened its presentation. The revised manuscript includes a derivation in Section 3.2 showing that, given the assume/guarantee contract (observability service correctly populates abstract predicates from observed events), the reference monitor's decisions are equivalent to the least Herbrand model of the Datalog policy. Because agents are unmodified, the contract is discharged by the service's implementation rather than by inspecting LLM internals; we now report empirical measurements from the three case studies confirming that predicates were populated consistently with observed actions. A complete model check over all possible LLM behaviors is not provided, as the non-determinism and opacity make exhaustive verification intractable; the framework instead relies on the contract as the interface between the service and the policy. revision: partial

  2. Referee: [Abstract] The manuscript introduces the reference monitor and observability service but supplies no concrete description of how causal history across agents is maintained in the Datalog predicates or how the contract is discharged for the three evaluated case studies. Without this, it is impossible to assess whether the framework's guarantees survive the non-determinism and prompt opacity of real LLM agents.

    Authors: We have added a concrete description in the revised Section 4.1: the observability service maintains a trace of actions, each tagged with agent identifier, timestamp, and content; these are asserted as Datalog facts, with recursive rules (e.g., flowsTo) encoding causal history across agents. Section 5 now includes, for each case study, a table listing the predicates, the observable events used to populate them, and the observed contract discharge (e.g., all prompt-injection flows were captured from message metadata without relying on internal LLM state). This shows that enforcement decisions remain sound under the non-determinism of the evaluated LLM agents, as only externally observable outputs are used. revision: yes

Circularity Check

0 steps flagged

No circularity; central claim is conditional on an external contract assumption

full rationale

The paper's derivation chain consists of a framework definition (Datalog policies over abstract predicates, observability service under assume/guarantee contract, reference monitor) whose key guarantee is explicitly conditional: 'When the environment contract holds, enforcement decisions coincide with the policy's intended semantics.' No equations, fitted parameters, or self-referential reductions appear. The contract is introduced as an assumption governing the observability service rather than derived from the paper's own results or prior self-citations. No load-bearing step reduces by construction to its inputs, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the existence of an observability service that maintains abstract predicates under a stated assume/guarantee contract; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption An observability service exists that maintains abstract predicates describing execution context under a formal assume/guarantee contract.
    Invoked in the description of how enforcement decisions coincide with policy semantics.
invented entities (1)
  • Reference monitor no independent evidence
    purpose: Consults the Datalog policy at each action to produce an enforcement decision.
    Core component of the enforcement architecture; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5595 in / 1234 out tokens · 15299 ms · 2026-05-15T21:05:40.239748+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Anumati: Proof of Adherence as a Formal Consent Model for Autonomous Agent Protocols

    cs.CR 2026-04 unverdicted novelty 7.0

    Anumati defines proof of adherence via versioned PolicyDocument, ConsentRecord, and AdherenceEvent primitives as a non-breaking extension to A2A and MCP protocols.

  2. Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

    cs.CR 2026-04 unverdicted novelty 7.0

    The paper defines causality laundering as an attack leaking information from denial outcomes in LLM tool calls and proposes the Agentic Reference Monitor to block it using denial-aware provenance graphs.

  3. SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

    cs.CR 2026-05 unverdicted novelty 6.0

    SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.

  4. An AI Agent Execution Environment to Safeguard User Data

    cs.CR 2026-04 unverdicted novelty 6.0

    GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...

  5. Owner-Harm: A Missing Threat Model for AI Agent Safety

    cs.CR 2026-04 unverdicted novelty 6.0

    Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic ...

  6. Security Considerations for Multi-agent Systems

    cs.CR 2026-03 unverdicted novelty 6.0

    No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

  7. Engineering Robustness into Personal Agents with the AI Workflow Store

    cs.CR 2026-05 unverdicted novelty 5.0

    AI agents should shift from on-the-fly plan synthesis to invoking pre-engineered, tested, and reusable workflows stored in an AI Workflow Store to gain reliability and security.

  8. Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

    cs.CR 2026-05 unverdicted novelty 5.0

    A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

  9. Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure

    cs.AI 2026-05 unverdicted novelty 5.0

    Multi-agent AI creates an authorization propagation problem not solved by prompt injection defenses or classical access control, requiring identity governance as continuously enforced infrastructure.

  10. Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

    cs.SE 2026-04 unverdicted novelty 5.0

    Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.

  11. Engineering Robustness into Personal Agents with the AI Workflow Store

    cs.CR 2026-05 unverdicted novelty 4.0

    AI agents require pre-engineered reusable workflows stored in a central repository rather than generating plans on the fly to achieve production-grade reliability and security.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · cited by 10 Pith papers · 12 internal anchors

  1. [1]

    NetKAT: Semantic foundations for networks

    Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. NetKAT: Semantic foundations for networks. InACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 113–126, 2014.doi:10.1145/2535838.2535862

  2. [2]

    Anderson

    James P. Anderson. Computer security technology planning study. Technical Report ESD-TR-73-51, Air Force Electronic Systems Division, October 1972

  3. [3]

    $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

    Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan.τ2-bench: Evaluating conversational agents in a dual-control environment, 2025. URL:https://arxiv.org/abs/2506.07982,arXiv:2506.07982

  4. [4]

    Monitoring security policies with metric first-order temporal logic

    David Basin, Felix Klaedtke, and Samuel Müller. Monitoring security policies with metric first-order temporal logic. InProceedings of the 15th ACM symposium on Access control models and technologies, pages 23–34, 2010

  5. [5]

    Becker, Cédric Fournet, and Andrew D

    Moritz Y . Becker, Cédric Fournet, and Andrew D. Gordon. SecPAL: Design and semantics of a decentralized authorization language.Journal of Computer Security, 18(4):619–665, 2010.doi:10.3233/JCS-2009-0364

  6. [6]

    LaPadula

    David Elliott Bell and Leonard J. LaPadula. Secure computer systems: Mathematical foundations. Technical Report MTR-2547, The MITRE Corporation, 1973

  7. [7]

    ChemCrow: Augmenting large-language models with chemistry tools

    Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023

  8. [8]

    Why Do Multi-Agent LLM Systems Fail?

    Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi-agent llm systems fail?arXiv preprint arXiv:2503.13657, 2025

  9. [9]

    What you always wanted to know about datalog(and never dared to ask).IEEE transactions on knowledge and data engineering, 1(1):146–166, 1989

    Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog(and never dared to ask).IEEE transactions on knowledge and data engineering, 1(1):146–166, 1989

  10. [10]

    Distributed snapshots: Determining global states of distributed systems

    K Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63–75, 1985

  11. [11]

    {StruQ}: Defending against prompt injection with structured queries

    Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. {StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Security 25), pages 2383–2400, 2025

  12. [12]

    Secalign: Defending against prompt injection with preference optimization

    Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 2833–2847, 2025

  13. [13]

    Shieldagent: Shielding agents via verifiable safety policy reasoning

    Zhaorun Chen, Mintong Kang, and Bo Li. Shieldagent: Shielding agents via verifiable safety policy reasoning. In Forty-second International Conference on Machine Learning, 2025. URL:https://openreview.net/forum?id= DkRYImuQA9

  14. [14]

    Kairos: Practical intrusion detection and investigation using whole-system provenance

    Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, and Xueyuan Han. Kairos: Practical intrusion detection and investigation using whole-system provenance. In2024 IEEE Symposium on Security and Privacy (SP), pages 3533–3551. IEEE, 2024

  15. [15]

    Malade: Orchestration of llm-powered agents with retrieval augmented generation for pharmacovigilance

    Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M Engelhard, Somesh Jha, Anivarya Kumar, and David Page. Malade: Orchestration of llm-powered agents with retrieval augmented generation for pharmacovigilance. InMachine Learning for Healthcare Conference. PMLR, 2024. 23

  16. [16]

    How not to detect prompt injections with an llm

    Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, and Somesh Jha. How not to detect prompt injections with an llm. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, pages 218–229, 2025

  17. [17]

    Systems security foundations for agentic computing.arXiv preprint arXiv:2512.01295, 2025

    Mihai Christodorescu, Earlence Fernandes, Ashish Hooda, Somesh Jha, Johann Rehberger, and Khawaja Shams. Systems security foundations for agentic computing.arXiv preprint arXiv:2512.01295, 2025

  18. [18]

    Securing AI Agents with Information-Flow Control

    Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

  19. [19]

    Cedar: A new language for expressive, fast, safe, and analyzable authorization

    Joseph Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Mike Hicks, Kesha Hietala, Eleft- erios Ioannidis, John Kastner, Anwar Mamat, Darin McAdams, Matt McCutchen, Neha Rungta, Emina Tor- lak, and Andrew Wells. Cedar: A new language for expressive, fast, safe, and analyzable authorization

  20. [20]

    URL:https://www.amazon.science/publications/cedar-a-new-language-for-expressive-fast-safe-and- analyzable-authorization

  21. [21]

    Defeating Prompt Injections by Design

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

  22. [22]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

  23. [23]

    Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

    Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

  24. [24]

    Denning and Peter J

    Dorothy E. Denning and Peter J. Denning. Certification of programs for secure information flow.Commun. ACM, 20(7):504–513, July 1977.doi:10.1145/359636.359712

  25. [25]

    Orange Book

    Department of Defense. Trusted computer system evaluation criteria. Technical Report DoD 5200.28-STD, National Computer Security Center, 1985. “Orange Book”

  26. [26]

    DeTreville

    J. DeTreville. Binder, a logic-based security language. InProceedings 2002 IEEE Symposium on Security and Privacy, pages 105–113, 2002.doi:10.1109/SECPRI.2002.1004365

  27. [27]

    Proposed nist standard for role-based access control.ACM Transactions on Information and System Security (TISSEC), 4(3):224–274, 2001

    David F Ferraiolo, Ravi Sandhu, Serban Gavrila, D Richard Kuhn, and Ramaswamy Chandramouli. Proposed nist standard for role-based access control.ACM Transactions on Information and System Security (TISSEC), 4(3):224–274, 2001

  28. [28]

    Timestamps in message-passing systems that preserve the partial ordering

    Colin J Fidge. Timestamps in message-passing systems that preserve the partial ordering. 1987

  29. [29]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

  30. [30]

    Planning anything with rigor: General-purpose zero-shot planning with llm-based formalized programming.arXiv preprint arXiv:2410.12112, 2024

    Yilun Hao, Yang Zhang, and Chuchu Fan. Planning anything with rigor: General-purpose zero-shot planning with llm-based formalized programming.arXiv preprint arXiv:2410.12112, 2024

  31. [31]

    A logical specification and analysis for selinux mls policy

    Boniface Hicks, Sandra Rueda, Luke St.Clair, Trent Jaeger, and Patrick McDaniel. A logical specification and analysis for selinux mls policy. InProceedings of the 12th ACM Symposium on Access Control Models and Technologies, SACMAT ’07, page 91–100, New York, NY , USA, 2007. Association for Computing Machinery. doi:10.1145/1266840.1266854

  32. [32]

    A multi-agent llm defense pipeline against prompt injection attacks.arXiv preprint arXiv:2509.14285, 2025

    SM Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, MF Mridha, and Jungpil Shin. A multi-agent llm defense pipeline against prompt injection attacks.arXiv preprint arXiv:2509.14285, 2025. 24

  33. [33]

    Attribute-based access control.Computer, 48(2):85–88, 2015

    Vincent C Hu, D Richard Kuhn, David F Ferraiolo, and Jeffrey V oas. Attribute-based access control.Computer, 48(2):85–88, 2015

  34. [34]

    Trustagent: Towards safe and trustworthy llm-based agents through agent constitution

    Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, and Yongfeng Zhang. Trustagent: Towards safe and trustworthy llm-based agents through agent constitution. InTrustworthy Multi-modal Foundation Models and AI Agents (TiFA), 2024

  35. [35]

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

    Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations.arXiv preprint arXiv:2312.06674, 2023

  36. [36]

    A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

    Yuqi Jia, Zedian Shao, Yupei Liu, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

  37. [37]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023

  38. [38]

    Policy-as-prompt: Turning ai governance rules into guardrails for ai agents,

    Gauri Kholkar and Ratinder Ahuja. Policy-as-prompt: Turning ai governance rules into guardrails for ai agents,

  39. [39]

    URL:https://arxiv.org/abs/2509.23994,arXiv:2509.23994

  40. [40]

    Invariant Labs.https://github.com/invariantlabs-ai/invariant, 2024

  41. [41]

    Time, clocks, and the ordering of events in a distributed system

    Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. InConcurrency: the Works of Leslie Lamport, pages 179–196. 2019

  42. [42]

    Datalog with constraints: A foundation for trust management languages

    Ninghui Li and John C Mitchell. Datalog with constraints: A foundation for trust management languages. In International Symposium on Practical Aspects of Declarative Languages, pages 58–73. Springer, 2003

  43. [43]

    Datasentinel: A game-theoretic detection of prompt injection attacks

    Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2190–2208. IEEE, 2025

  44. [44]

    Univ., Department of Computer Science, 1988

    Friedemann Mattern et al.Virtual time and global states of distributed systems. Univ., Department of Computer Science, 1988

  45. [45]

    Veriguard: Enhancing llm agent safety via verified code generation.arXiv preprint arXiv:2510.05156, 2025

    Lesly Miculicich, Mihir Parmar, Hamid Palangi, Krishnamurthy Dj Dvijotham, Mirko Montanari, Tomas Pfis- ter, and Long T Le. Veriguard: Enhancing llm agent safety via verified code generation.arXiv preprint arXiv:2510.05156, 2025

  46. [46]

    The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv preprint arXiv:2510.09023, 2025

    Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, et al. The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv preprint arXiv:2510.09023, 2025

  47. [47]

    NeMo Guardrails: Open-source toolkit for adding programmable guardrails to LLM-based conversa- tional systems.https://github.com/NVIDIA/NeMo-Guardrails, 2024

    NVIDIA. NeMo Guardrails: Open-source toolkit for adding programmable guardrails to LLM-based conversa- tional systems.https://github.com/NVIDIA/NeMo-Guardrails, 2024

  48. [48]

    OW ASP Top 10 for Large Language Model Applications.https://owasp.org/www-project- top-10-for-large-language-model-applications/, 2025

    OW ASP Foundation. OW ASP Top 10 for Large Language Model Applications.https://owasp.org/www-project- top-10-for-large-language-model-applications/, 2025

  49. [49]

    Zanzibar:{Google’s} consistent, global authorization system

    Ruoming Pang, Ramon Caceres, Mike Burrows, Zhifeng Chen, Pratik Dave, Nathan Germer, Alexander Golynski, Kevin Graney, Nina Kang, Lea Kissner, et al. Zanzibar:{Google’s} consistent, global authorization system. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 33–46, 2019

  50. [50]

    The temporal logic of programs

    Amir Pnueli. The temporal logic of programs. In18th annual symposium on foundations of computer science (sfcs 1977), pages 46–57. ieee, 1977

  51. [51]

    The transport layer security (TLS) protocol version 1.3

    Eric Rescorla. The transport layer security (TLS) protocol version 1.3. RFC 8446, Internet Engineering Task Force, August 2018.doi:10.17487/RFC8446. 25

  52. [52]

    Differential Datalog

    Leonid Ryzhyk and Mihai Budiu. Differential Datalog. InDatalog 2.0 2019 - 3rd International Workshop on the Resurgence of Datalog in Academia and Industry, pages 56–67, 2019. URL:https://ceur-ws.org/Vol-2368/ paper6.pdf

  53. [53]

    OpenID Connect Core 1.0

    Nat Sakimura, John Bradley, Mike Jones, Breno de Medeiros, and Chuck Mortimore. OpenID Connect Core 1.0. OpenID Foundation, 2014. URL:https://openid.net/specs/openid-connect-core-1_0.html

  54. [54]

    A survey of challenges for runtime verification from advanced application domains (beyond software).Formal Methods in System Design, 54(3):279– 335, 2019

    César Sánchez, Gerardo Schneider, Wolfgang Ahrendt, Ezio Bartocci, Domenico Bianculli, Christian Colombo, Yliès Falcone, Adrian Francalanza, Sr ¯dan Krsti´c, João M Lourenço, et al. A survey of challenges for runtime verification from advanced application domains (beyond software).Formal Methods in System Design, 54(3):279– 335, 2019

  55. [55]

    Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

  56. [56]

    Progent: Securing AI Agents with Privilege Control

    Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, and Dawn Song. Progent: Pro- grammable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025

  57. [57]

    Implementing SELinux as a Linux security module

    Stephen Smalley, Chris Vance, and Wayne Salamon. Implementing SELinux as a Linux security module. Technical Report 01-043, NAI Labs, 2001. URL:https://www.nsa.gov/portals/75/documents/resources/everyone/ digital-media-center/publications/research-papers/implementing-selinux-as-linux-security-module-report. pdf

  58. [58]

    The Flask security architecture: System support for diverse security policies

    Ray Spencer, Stephen Smalley, Peter Loscocco, Mike Hibler, David Andersen, and Jay Lepreau. The Flask security architecture: System support for diverse security policies. In8th USENIX Security Symposium, pages 123–139,

  59. [59]

    URL:https://www.usenix.org/conference/8th-usenix-security-symposium/flask-security-architecture- system-support-diverse-security

  60. [60]

    Open policy agent

    Styra. Open policy agent. 2024

  61. [61]

    Revolutionizing customer service: The impact of large language models on chatbot perfor- mance.INTERNATIONAL JOURNAL, 10(5):721–730, 2024

    MEDURI SUDEEP. Revolutionizing customer service: The impact of large language models on chatbot perfor- mance.INTERNATIONAL JOURNAL, 10(5):721–730, 2024

  62. [62]

    CIL: Common intermediate language for SELinux policy.https://github.com/ SELinuxProject/selinux/tree/main/libsepol/cil, 2011

    Tresys Technology. CIL: Common intermediate language for SELinux policy.https://github.com/ SELinuxProject/selinux/tree/main/libsepol/cil, 2011

  63. [63]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. The instruction hierarchy: Training llms to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024

  64. [64]

    AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

    Haoyu Wang, Christopher M Poskitt, and Jun Sun. Agentspec: Customizable runtime enforcement for safe and reliable llm agents.arXiv preprint arXiv:2503.18666, 2025

  65. [65]

    Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning.arXiv preprint arXiv:2406.09187, 2024

    Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, et al. Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning.arXiv preprint arXiv:2406.09187, 2024

  66. [66]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

  67. [67]

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024

  68. [68]

    Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

    Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents. arXiv preprint arXiv:2410.02644, 2024. 26

  69. [69]

    On prompt-driven safeguarding for large language models

    Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, and Nanyun Peng. On prompt-driven safeguarding for large language models. InProceedings of the 41st International Conference on Machine Learning, pages 61593–61613, 2024

  70. [70]

    helpfully

    Michael Zipperle, Florian Gottwalt, Elizabeth Chang, and Tharam Dillon. Provenance-based intrusion detection systems: A survey.ACM Computing Surveys, 55(7):1–36, 2022. 27 A Policy Rules forτ 2-bench Case Study This appendix provides the complete Datalog policy rules used in our τ2-bench evaluation (§5.3). We present rules for both the airline and retail d...