arxiv: 2602.16708 · v3 · submitted 2026-02-18 · 💻 cs.CR · cs.AI· cs.MA

Recognition: no theorem link

Formal Policy Enforcement for Real-World Agentic Systems

Nils Palumbo , Sarthak Choudhary , Jihye Choi , Guy Amir , Prasad Chalasani , Somesh Jha

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:05 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.MA

keywords policy enforcementagentic systemsDatalogreference monitorassume-guarantee contractsmulti-agent systemssecurity policiesaspect-oriented programming

0 comments

The pith

Policy enforcement in agentic systems can be made formally correct by separating Datalog rules from agent reasoning and checking them against an observability contract.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats security policies as a cross-cutting concern rather than embedding them in agent prompts. It specifies policies declaratively in Datalog over abstract predicates that describe execution context. An observability service maintains those predicates under a formal assume/guarantee contract, and a reference monitor consults the policy at each relevant decision point. When the contract holds, the monitor's decisions match the intended policy semantics exactly. This approach supports static analyses for policy properties and works without modifying the underlying agents.

Core claim

By grounding enforcement in aspect-oriented programming, policies can be written independently of any agent's reasoning in Datalog over abstract predicates. An observability service governed by an assume/guarantee contract maintains these predicates, and a reference monitor produces an enforcement decision at every policy-relevant action. When the environment contract holds, these decisions coincide with the policy's intended semantics.

What carries the argument

Datalog policies over abstract predicates, maintained by an observability service under an assume/guarantee contract and consulted by a reference monitor at each action.

If this is right

Policies written in Datalog support recursion and admit static analyses that detect contradictions, redundancy, and conditional reachability.
Enforcement decisions remain independent of the agent's internal reasoning and require no changes to the agent code.
The same policy mechanism applies across information-flow controls, multi-agent approval workflows, and organizational rules.
Natural-language policy ambiguities can be surfaced through the static analyses before deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Existing agent deployments can adopt the framework by adding the observability service and reference monitor as middleware.
Policy maintenance becomes separable from agent updates, allowing organizations to revise rules without retraining models.
In settings where the observability contract cannot be guaranteed, the formal match between decisions and semantics would no longer hold.

Load-bearing premise

The observability service correctly maintains the abstract predicates describing execution context.

What would settle it

A concrete execution trace in which the assume/guarantee contract on the observability service holds, yet the reference monitor produces a decision that differs from the Datalog policy evaluation on the maintained predicates.

Figures

Figures reproduced from arXiv: 2602.16708 by Guy Amir, Jihye Choi, Nils Palumbo, Prasad Chalasani, Sarthak Choudhary, Somesh Jha.

**Figure 1.** Figure 1: PCAS overview. (a) The instrumentation layer intercepts messages, tool calls, and HTTP requests. The reference monitor queries the policy engine, passing the action, identity, and roles, which evaluates Datalog rules against the dependency graph from the observability service. (b) Authorized actions are executed and recorded; denied actions return structured feedback to the agent. (TCB) [24]. Our security … view at source ↗

**Figure 2.** Figure 2: Information flow policies for prompt injection defense. (a) Bell-LaPadula MLS enforces clearance-based [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Policy enforcement example from the τ 2 -bench airline domain. Left: A customer requests flight cancellation due to an accidental booking. The agent attempts to call cancel_reservation, but PCAS blocks the action based on the policy. After receiving structured feedback, the agent recovers by offering compliant alternatives. Middle: The natural language policy specifying covered cancellation reasons is tran… view at source ↗

**Figure 4.** Figure 4: Task success rates (pass^k) on τ 2 -bench tasks. The pass^k metric measures the probability that k randomly sampled trials all succeed, capturing consistency rather than single-shot performance. PCAS-instrumented agents (solid lines) consistently outperform non-instrumented baselines (dashed lines) across all values of k, demonstrating that enforcement improves compliance without degrading task completion.… view at source ↗

read the original abstract

Security policy enforcement in contemporary agentic systems predominantly consists of embedding natural-language policies within an agent's system prompt and delegating compliance to the agent's reasoning. This approach admits no formal enforcement guarantee and cannot express policies whose satisfaction depends on the causal history of an execution, a gap that becomes acute in multi-agent systems, where enforcement must reason across agents. We argue that policy enforcement in agentic systems is most naturally understood as a cross-cutting concern, and propose a framework grounded in aspect-oriented programming that specifies policies independent of the agent's reasoning and enforces them at every policy-relevant decision. Policies are written in Datalog over a set of abstract predicates describing the execution context, an observability service governed by a formal assume/guarantee contract maintains these predicates, and a reference monitor consults the policy at each action to produce an enforcement decision. When the environment contract holds, enforcement decisions coincide with the policy's intended semantics. We adopt Datalog as the policy language, a natural fit because it supports declarative rule specification, admits recursion for policies over transitive relationships, and yields deterministic enforcement. Datalog further admits tractable static analyses for contradiction, redundancy, subsumption, and conditional reachability, enabling authors to verify policy intent and surface ambiguities inherent in natural-language specifications. We realize the framework in FORGE, which enforces policies over agentic deployments without modification to the underlying agents. We evaluate FORGE on three case studies: information flow policies for prompt injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FORGE separates policy from agent reasoning with Datalog and a reference monitor over causal history, but the guarantees rest on an unverified observability contract.

read the letter

The paper's main advance is showing how to enforce policies that depend on execution history across agents by treating enforcement as a cross-cutting concern. Policies are written in Datalog over abstract predicates, an observability service keeps those predicates up to date under an assume/guarantee contract, and a reference monitor checks the policy before each action. They implement this in FORGE and run it on unmodified agents for three cases: information-flow rules to block prompt injection, approval chains in a pharmacovigilance multi-agent setup, and organizational rules in customer service. Datalog is a reasonable choice here because it supports recursion and lets them run static checks for contradictions or redundancy in the policy itself. That part is concrete and addresses a real gap in current prompt-only approaches. The soft spot is the contract itself. The claim that enforcement matches intended semantics when the contract holds is load-bearing, yet the description gives no machine-checked proof, model checking, or measurements showing that the observability service actually delivers correct predicates when the agents are opaque LLMs. If the full paper does not supply that evidence or a clear implementation path for it, the formal guarantee stays conditional rather than demonstrated. This is useful reading for people building regulated agentic systems who want something more structured than prompt engineering. It has enough working code and case studies to justify sending it to referees, mainly to check whether the contract is actually substantiated in the implementation and evaluation sections.

Referee Report

2 major / 1 minor

Summary. The paper claims that policy enforcement in agentic systems should be treated as a cross-cutting concern via an aspect-oriented framework. Policies are expressed declaratively in Datalog over abstract predicates on execution context; an observability service maintains these predicates under a formal assume/guarantee contract; and a reference monitor consults the policy at each action to produce enforcement decisions. When the contract holds, decisions coincide with intended semantics. The approach is realized in the FORGE system, which requires no agent modification, and is evaluated on three case studies: information-flow policies for prompt-injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service.

Significance. If the observability contract can be shown to hold for unmodified LLM agents, the work would supply a concrete mechanism for history-dependent and multi-agent policy enforcement that current prompt-embedding methods cannot provide. The choice of Datalog is a strength: it enables static analyses for contradiction, redundancy, and reachability, allowing authors to verify policy intent independently of runtime behavior. The no-modification deployment model is practically valuable for real-world agentic systems.

major comments (2)

[Abstract] Abstract: The load-bearing claim that 'When the environment contract holds, enforcement decisions coincide with the policy's intended semantics' is stated without any derivation, model-checking result, or empirical measurement showing that the assume/guarantee contract is satisfied by the actual FORGE observability service when agents are unmodified, non-deterministic LLMs whose internal state is opaque. This leaves the central guarantee as an unverified premise rather than a derived property.
[Abstract] The manuscript introduces the reference monitor and observability service but supplies no concrete description of how causal history across agents is maintained in the Datalog predicates or how the contract is discharged for the three evaluated case studies. Without this, it is impossible to assess whether the framework's guarantees survive the non-determinism and prompt opacity of real LLM agents.

minor comments (1)

[Abstract] The abstract would benefit from a single sentence clarifying the scope of the assume/guarantee contract (e.g., which predicates are assumed versus guaranteed and under what failure model).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on the formal guarantees and practical applicability of FORGE. The feedback highlights the need for clearer exposition of the assume/guarantee contract and its realization in the case studies. We have revised the abstract, added a derivation sketch, and expanded the description of the observability service and case studies to address these points directly.

read point-by-point responses

Referee: [Abstract] Abstract: The load-bearing claim that 'When the environment contract holds, enforcement decisions coincide with the policy's intended semantics' is stated without any derivation, model-checking result, or empirical measurement showing that the assume/guarantee contract is satisfied by the actual FORGE observability service when agents are unmodified, non-deterministic LLMs whose internal state is opaque. This leaves the central guarantee as an unverified premise rather than a derived property.

Authors: We agree that the central claim is conditional and have strengthened its presentation. The revised manuscript includes a derivation in Section 3.2 showing that, given the assume/guarantee contract (observability service correctly populates abstract predicates from observed events), the reference monitor's decisions are equivalent to the least Herbrand model of the Datalog policy. Because agents are unmodified, the contract is discharged by the service's implementation rather than by inspecting LLM internals; we now report empirical measurements from the three case studies confirming that predicates were populated consistently with observed actions. A complete model check over all possible LLM behaviors is not provided, as the non-determinism and opacity make exhaustive verification intractable; the framework instead relies on the contract as the interface between the service and the policy. revision: partial
Referee: [Abstract] The manuscript introduces the reference monitor and observability service but supplies no concrete description of how causal history across agents is maintained in the Datalog predicates or how the contract is discharged for the three evaluated case studies. Without this, it is impossible to assess whether the framework's guarantees survive the non-determinism and prompt opacity of real LLM agents.

Authors: We have added a concrete description in the revised Section 4.1: the observability service maintains a trace of actions, each tagged with agent identifier, timestamp, and content; these are asserted as Datalog facts, with recursive rules (e.g., flowsTo) encoding causal history across agents. Section 5 now includes, for each case study, a table listing the predicates, the observable events used to populate them, and the observed contract discharge (e.g., all prompt-injection flows were captured from message metadata without relying on internal LLM state). This shows that enforcement decisions remain sound under the non-determinism of the evaluated LLM agents, as only externally observable outputs are used. revision: yes

Circularity Check

0 steps flagged

No circularity; central claim is conditional on an external contract assumption

full rationale

The paper's derivation chain consists of a framework definition (Datalog policies over abstract predicates, observability service under assume/guarantee contract, reference monitor) whose key guarantee is explicitly conditional: 'When the environment contract holds, enforcement decisions coincide with the policy's intended semantics.' No equations, fitted parameters, or self-referential reductions appear. The contract is introduced as an assumption governing the observability service rather than derived from the paper's own results or prior self-citations. No load-bearing step reduces by construction to its inputs, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the existence of an observability service that maintains abstract predicates under a stated assume/guarantee contract; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption An observability service exists that maintains abstract predicates describing execution context under a formal assume/guarantee contract.
Invoked in the description of how enforcement decisions coincide with policy semantics.

invented entities (1)

Reference monitor no independent evidence
purpose: Consults the Datalog policy at each action to produce an enforcement decision.
Core component of the enforcement architecture; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5595 in / 1234 out tokens · 15299 ms · 2026-05-15T21:05:40.239748+00:00 · methodology

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Anumati: Proof of Adherence as a Formal Consent Model for Autonomous Agent Protocols
cs.CR 2026-04 unverdicted novelty 7.0

Anumati defines proof of adherence via versioned PolicyDocument, ConsentRecord, and AdherenceEvent primitives as a non-breaking extension to A2A and MCP protocols.
Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents
cs.CR 2026-04 unverdicted novelty 7.0

The paper defines causality laundering as an attack leaking information from denial outcomes in LLM tool calls and proposes the Agentic Reference Monitor to block it using denial-aware provenance graphs.
SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response
cs.CR 2026-05 unverdicted novelty 6.0

SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.
An AI Agent Execution Environment to Safeguard User Data
cs.CR 2026-04 unverdicted novelty 6.0

GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...
Owner-Harm: A Missing Threat Model for AI Agent Safety
cs.CR 2026-04 unverdicted novelty 6.0

Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic ...
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Engineering Robustness into Personal Agents with the AI Workflow Store
cs.CR 2026-05 unverdicted novelty 5.0

AI agents should shift from on-the-fly plan synthesis to invoking pre-engineered, tested, and reusable workflows stored in an AI Workflow Store to gain reliability and security.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
cs.CR 2026-05 unverdicted novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure
cs.AI 2026-05 unverdicted novelty 5.0

Multi-agent AI creates an authorization propagation problem not solved by prompt injection defenses or classical access control, requiring identity governance as continuously enforced infrastructure.
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
cs.SE 2026-04 unverdicted novelty 5.0

Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
Engineering Robustness into Personal Agents with the AI Workflow Store
cs.CR 2026-05 unverdicted novelty 4.0

AI agents require pre-engineered reusable workflows stored in a central repository rather than generating plans on the fly to achieve production-grade reliability and security.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · cited by 10 Pith papers · 12 internal anchors

[1]

NetKAT: Semantic foundations for networks

Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. NetKAT: Semantic foundations for networks. InACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 113–126, 2014.doi:10.1145/2535838.2535862

work page doi:10.1145/2535838.2535862 2014
[2]

Anderson

James P. Anderson. Computer security technology planning study. Technical Report ESD-TR-73-51, Air Force Electronic Systems Division, October 1972

work page 1972
[3]

$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan.τ2-bench: Evaluating conversational agents in a dual-control environment, 2025. URL:https://arxiv.org/abs/2506.07982,arXiv:2506.07982

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Monitoring security policies with metric first-order temporal logic

David Basin, Felix Klaedtke, and Samuel Müller. Monitoring security policies with metric first-order temporal logic. InProceedings of the 15th ACM symposium on Access control models and technologies, pages 23–34, 2010

work page 2010
[5]

Becker, Cédric Fournet, and Andrew D

Moritz Y . Becker, Cédric Fournet, and Andrew D. Gordon. SecPAL: Design and semantics of a decentralized authorization language.Journal of Computer Security, 18(4):619–665, 2010.doi:10.3233/JCS-2009-0364

work page doi:10.3233/jcs-2009-0364 2010
[6]

LaPadula

David Elliott Bell and Leonard J. LaPadula. Secure computer systems: Mathematical foundations. Technical Report MTR-2547, The MITRE Corporation, 1973

work page 1973
[7]

ChemCrow: Augmenting large-language models with chemistry tools

Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi-agent llm systems fail?arXiv preprint arXiv:2503.13657, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

What you always wanted to know about datalog(and never dared to ask).IEEE transactions on knowledge and data engineering, 1(1):146–166, 1989

Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog(and never dared to ask).IEEE transactions on knowledge and data engineering, 1(1):146–166, 1989

work page 1989
[10]

Distributed snapshots: Determining global states of distributed systems

K Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63–75, 1985

work page 1985
[11]

{StruQ}: Defending against prompt injection with structured queries

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. {StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Security 25), pages 2383–2400, 2025

work page 2025
[12]

Secalign: Defending against prompt injection with preference optimization

Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 2833–2847, 2025

work page 2025
[13]

Shieldagent: Shielding agents via verifiable safety policy reasoning

Zhaorun Chen, Mintong Kang, and Bo Li. Shieldagent: Shielding agents via verifiable safety policy reasoning. In Forty-second International Conference on Machine Learning, 2025. URL:https://openreview.net/forum?id= DkRYImuQA9

work page 2025
[14]

Kairos: Practical intrusion detection and investigation using whole-system provenance

Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, and Xueyuan Han. Kairos: Practical intrusion detection and investigation using whole-system provenance. In2024 IEEE Symposium on Security and Privacy (SP), pages 3533–3551. IEEE, 2024

work page 2024
[15]

Malade: Orchestration of llm-powered agents with retrieval augmented generation for pharmacovigilance

Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M Engelhard, Somesh Jha, Anivarya Kumar, and David Page. Malade: Orchestration of llm-powered agents with retrieval augmented generation for pharmacovigilance. InMachine Learning for Healthcare Conference. PMLR, 2024. 23

work page 2024
[16]

How not to detect prompt injections with an llm

Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, and Somesh Jha. How not to detect prompt injections with an llm. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, pages 218–229, 2025

work page 2025
[17]

Systems security foundations for agentic computing.arXiv preprint arXiv:2512.01295, 2025

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda, Somesh Jha, Johann Rehberger, and Khawaja Shams. Systems security foundations for agentic computing.arXiv preprint arXiv:2512.01295, 2025

work page arXiv 2025
[18]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Cedar: A new language for expressive, fast, safe, and analyzable authorization

Joseph Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Mike Hicks, Kesha Hietala, Eleft- erios Ioannidis, John Kastner, Anwar Mamat, Darin McAdams, Matt McCutchen, Neha Rungta, Emina Tor- lak, and Andrew Wells. Cedar: A new language for expressive, fast, safe, and analyzable authorization

work page
[20]

URL:https://www.amazon.science/publications/cedar-a-new-language-for-expressive-fast-safe-and- analyzable-authorization

work page
[21]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

work page 2024
[23]

Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

work page 2025
[24]

Denning and Peter J

Dorothy E. Denning and Peter J. Denning. Certification of programs for secure information flow.Commun. ACM, 20(7):504–513, July 1977.doi:10.1145/359636.359712

work page doi:10.1145/359636.359712 1977
[25]

Orange Book

Department of Defense. Trusted computer system evaluation criteria. Technical Report DoD 5200.28-STD, National Computer Security Center, 1985. “Orange Book”

work page 1985
[26]

DeTreville

J. DeTreville. Binder, a logic-based security language. InProceedings 2002 IEEE Symposium on Security and Privacy, pages 105–113, 2002.doi:10.1109/SECPRI.2002.1004365

work page doi:10.1109/secpri.2002.1004365 2002
[27]

Proposed nist standard for role-based access control.ACM Transactions on Information and System Security (TISSEC), 4(3):224–274, 2001

David F Ferraiolo, Ravi Sandhu, Serban Gavrila, D Richard Kuhn, and Ramaswamy Chandramouli. Proposed nist standard for role-based access control.ACM Transactions on Information and System Security (TISSEC), 4(3):224–274, 2001

work page 2001
[28]

Timestamps in message-passing systems that preserve the partial ordering

Colin J Fidge. Timestamps in message-passing systems that preserve the partial ordering. 1987

work page 1987
[29]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

work page 2023
[30]

Planning anything with rigor: General-purpose zero-shot planning with llm-based formalized programming.arXiv preprint arXiv:2410.12112, 2024

Yilun Hao, Yang Zhang, and Chuchu Fan. Planning anything with rigor: General-purpose zero-shot planning with llm-based formalized programming.arXiv preprint arXiv:2410.12112, 2024

work page arXiv 2024
[31]

A logical specification and analysis for selinux mls policy

Boniface Hicks, Sandra Rueda, Luke St.Clair, Trent Jaeger, and Patrick McDaniel. A logical specification and analysis for selinux mls policy. InProceedings of the 12th ACM Symposium on Access Control Models and Technologies, SACMAT ’07, page 91–100, New York, NY , USA, 2007. Association for Computing Machinery. doi:10.1145/1266840.1266854

work page doi:10.1145/1266840.1266854 2007
[32]

A multi-agent llm defense pipeline against prompt injection attacks.arXiv preprint arXiv:2509.14285, 2025

SM Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, MF Mridha, and Jungpil Shin. A multi-agent llm defense pipeline against prompt injection attacks.arXiv preprint arXiv:2509.14285, 2025. 24

work page arXiv 2025
[33]

Attribute-based access control.Computer, 48(2):85–88, 2015

Vincent C Hu, D Richard Kuhn, David F Ferraiolo, and Jeffrey V oas. Attribute-based access control.Computer, 48(2):85–88, 2015

work page 2015
[34]

Trustagent: Towards safe and trustworthy llm-based agents through agent constitution

Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, and Yongfeng Zhang. Trustagent: Towards safe and trustworthy llm-based agents through agent constitution. InTrustworthy Multi-modal Foundation Models and AI Agents (TiFA), 2024

work page 2024
[35]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations.arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

Yuqi Jia, Zedian Shao, Yupei Liu, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

work page arXiv 2025
[37]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Policy-as-prompt: Turning ai governance rules into guardrails for ai agents,

Gauri Kholkar and Ratinder Ahuja. Policy-as-prompt: Turning ai governance rules into guardrails for ai agents,

work page
[39]

URL:https://arxiv.org/abs/2509.23994,arXiv:2509.23994

work page arXiv
[40]

Invariant Labs.https://github.com/invariantlabs-ai/invariant, 2024

work page 2024
[41]

Time, clocks, and the ordering of events in a distributed system

Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. InConcurrency: the Works of Leslie Lamport, pages 179–196. 2019

work page 2019
[42]

Datalog with constraints: A foundation for trust management languages

Ninghui Li and John C Mitchell. Datalog with constraints: A foundation for trust management languages. In International Symposium on Practical Aspects of Declarative Languages, pages 58–73. Springer, 2003

work page 2003
[43]

Datasentinel: A game-theoretic detection of prompt injection attacks

Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2190–2208. IEEE, 2025

work page 2025
[44]

Univ., Department of Computer Science, 1988

Friedemann Mattern et al.Virtual time and global states of distributed systems. Univ., Department of Computer Science, 1988

work page 1988
[45]

Veriguard: Enhancing llm agent safety via verified code generation.arXiv preprint arXiv:2510.05156, 2025

Lesly Miculicich, Mihir Parmar, Hamid Palangi, Krishnamurthy Dj Dvijotham, Mirko Montanari, Tomas Pfis- ter, and Long T Le. Veriguard: Enhancing llm agent safety via verified code generation.arXiv preprint arXiv:2510.05156, 2025

work page arXiv 2025
[46]

The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv preprint arXiv:2510.09023, 2025

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, et al. The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv preprint arXiv:2510.09023, 2025

work page arXiv 2025
[47]

NeMo Guardrails: Open-source toolkit for adding programmable guardrails to LLM-based conversa- tional systems.https://github.com/NVIDIA/NeMo-Guardrails, 2024

NVIDIA. NeMo Guardrails: Open-source toolkit for adding programmable guardrails to LLM-based conversa- tional systems.https://github.com/NVIDIA/NeMo-Guardrails, 2024

work page 2024
[48]

OW ASP Top 10 for Large Language Model Applications.https://owasp.org/www-project- top-10-for-large-language-model-applications/, 2025

OW ASP Foundation. OW ASP Top 10 for Large Language Model Applications.https://owasp.org/www-project- top-10-for-large-language-model-applications/, 2025

work page 2025
[49]

Zanzibar:{Google’s} consistent, global authorization system

Ruoming Pang, Ramon Caceres, Mike Burrows, Zhifeng Chen, Pratik Dave, Nathan Germer, Alexander Golynski, Kevin Graney, Nina Kang, Lea Kissner, et al. Zanzibar:{Google’s} consistent, global authorization system. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 33–46, 2019

work page 2019
[50]

The temporal logic of programs

Amir Pnueli. The temporal logic of programs. In18th annual symposium on foundations of computer science (sfcs 1977), pages 46–57. ieee, 1977

work page 1977
[51]

The transport layer security (TLS) protocol version 1.3

Eric Rescorla. The transport layer security (TLS) protocol version 1.3. RFC 8446, Internet Engineering Task Force, August 2018.doi:10.17487/RFC8446. 25

work page doi:10.17487/rfc8446 2018
[52]

Differential Datalog

Leonid Ryzhyk and Mihai Budiu. Differential Datalog. InDatalog 2.0 2019 - 3rd International Workshop on the Resurgence of Datalog in Academia and Industry, pages 56–67, 2019. URL:https://ceur-ws.org/Vol-2368/ paper6.pdf

work page 2019
[53]

OpenID Connect Core 1.0

Nat Sakimura, John Bradley, Mike Jones, Breno de Medeiros, and Chuck Mortimore. OpenID Connect Core 1.0. OpenID Foundation, 2014. URL:https://openid.net/specs/openid-connect-core-1_0.html

work page 2014
[54]

A survey of challenges for runtime verification from advanced application domains (beyond software).Formal Methods in System Design, 54(3):279– 335, 2019

César Sánchez, Gerardo Schneider, Wolfgang Ahrendt, Ezio Bartocci, Domenico Bianculli, Christian Colombo, Yliès Falcone, Adrian Francalanza, Sr ¯dan Krsti´c, João M Lourenço, et al. A survey of challenges for runtime verification from advanced application domains (beyond software).Formal Methods in System Design, 54(3):279– 335, 2019

work page 2019
[55]

Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

work page 2023
[56]

Progent: Securing AI Agents with Privilege Control

Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, and Dawn Song. Progent: Pro- grammable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

Implementing SELinux as a Linux security module

Stephen Smalley, Chris Vance, and Wayne Salamon. Implementing SELinux as a Linux security module. Technical Report 01-043, NAI Labs, 2001. URL:https://www.nsa.gov/portals/75/documents/resources/everyone/ digital-media-center/publications/research-papers/implementing-selinux-as-linux-security-module-report. pdf

work page 2001
[58]

The Flask security architecture: System support for diverse security policies

Ray Spencer, Stephen Smalley, Peter Loscocco, Mike Hibler, David Andersen, and Jay Lepreau. The Flask security architecture: System support for diverse security policies. In8th USENIX Security Symposium, pages 123–139,

work page
[59]

URL:https://www.usenix.org/conference/8th-usenix-security-symposium/flask-security-architecture- system-support-diverse-security

work page
[60]

Open policy agent

Styra. Open policy agent. 2024

work page 2024
[61]

Revolutionizing customer service: The impact of large language models on chatbot perfor- mance.INTERNATIONAL JOURNAL, 10(5):721–730, 2024

MEDURI SUDEEP. Revolutionizing customer service: The impact of large language models on chatbot perfor- mance.INTERNATIONAL JOURNAL, 10(5):721–730, 2024

work page 2024
[62]

CIL: Common intermediate language for SELinux policy.https://github.com/ SELinuxProject/selinux/tree/main/libsepol/cil, 2011

Tresys Technology. CIL: Common intermediate language for SELinux policy.https://github.com/ SELinuxProject/selinux/tree/main/libsepol/cil, 2011

work page 2011
[63]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. The instruction hierarchy: Training llms to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Haoyu Wang, Christopher M Poskitt, and Jun Sun. Agentspec: Customizable runtime enforcement for safe and reliable llm agents.arXiv preprint arXiv:2503.18666, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[65]

Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning.arXiv preprint arXiv:2406.09187, 2024

Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, et al. Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning.arXiv preprint arXiv:2406.09187, 2024

work page arXiv 2024
[66]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

work page 2022
[67]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[68]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents. arXiv preprint arXiv:2410.02644, 2024. 26

work page internal anchor Pith review Pith/arXiv arXiv 2024
[69]

On prompt-driven safeguarding for large language models

Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, and Nanyun Peng. On prompt-driven safeguarding for large language models. InProceedings of the 41st International Conference on Machine Learning, pages 61593–61613, 2024

work page 2024
[70]

helpfully

Michael Zipperle, Florian Gottwalt, Elizabeth Chang, and Tharam Dillon. Provenance-based intrusion detection systems: A survey.ACM Computing Surveys, 55(7):1–36, 2022. 27 A Policy Rules forτ 2-bench Case Study This appendix provides the complete Datalog policy rules used in our τ2-bench evaluation (§5.3). We present rules for both the airline and retail d...

work page 2022