Recognition: no theorem link
Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents
Pith reviewed 2026-05-13 17:16 UTC · model grok-4.3
The pith
Tool-calling LLM agents can leak private data through the outcome of denied actions, which standard provenance tracking misses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Causality laundering is the attack in which probing a protected action produces a denial that leaks information, which the adversary then carries forward through a subsequent tool call that appears safe. The Agentic Reference Monitor counters this by maintaining a provenance graph over tool calls, returned data, field-level tags, and denied actions, augmented with counterfactual edges that represent the causal influence of each denial. Trust propagates through an integrity lattice, so any tool call whose inputs carry influence from a denied action can be rejected according to policy.
What carries the argument
The Agentic Reference Monitor (ARM), a runtime mediator that consults a provenance graph augmented with counterfactual edges from denied-action nodes to enforce integrity policies over both data dependencies and denial-induced causal influence.
If this is right
- ARM blocks causality laundering, transitive taint, and mixed-provenance field misuse that flat provenance tracking misses.
- Policy checks add only sub-millisecond overhead in the evaluated scenarios.
- Denial-aware causal provenance becomes a necessary abstraction for any tool-calling agent that must respect action restrictions.
- Enforcement must occur at every tool invocation rather than only at data return points.
Where Pith is reading between the lines
- The same denial-feedback channel could appear in multi-agent workflows where one agent's denial affects another's later decisions.
- Tool designers would need explicit policies that limit the information content of denial messages themselves.
- Production deployments would need lightweight ways to approximate the full counterfactual graph without storing every possible denial outcome.
Load-bearing premise
A runtime system can accurately build and consult a provenance graph that includes all counterfactual edges from denied actions without missing causal links or introducing new attack surfaces.
What would settle it
A controlled run of one of the three evaluated attack scenarios in which the Agentic Reference Monitor allows an exfiltration call that carries information inferred solely from a prior denial would show the monitor does not block causality laundering.
Figures
read the original abstract
Tool-calling LLM agents can read private data, invoke external services, and trigger real-world actions, creating a security problem at the point of tool execution. We identify a denial-feedback leakage pattern, which we term causality laundering, in which an adversary probes a protected action, learns from the denial outcome, and exfiltrates the inferred information through a later seemingly benign tool call. This attack is not captured by flat provenance tracking alone because the leaked information arises from causal influence of the denied action, not direct data flow. We present the Agentic Reference Monitor (ARM), a runtime enforcement layer that mediates every tool invocation by consulting a provenance graph over tool calls, returned data, field-level provenance, and denied actions. ARM propagates trust through an integrity lattice and augments the graph with counterfactual edges from denied-action nodes, enabling enforcement over both transitive data dependencies and denial-induced causal influence. In a controlled evaluation on three representative attack scenarios, ARM blocks causality laundering, transitive taint propagation, and mixed-provenance field misuse that a flat provenance baseline misses, while adding sub-millisecond policy evaluation overhead. These results suggest that denial-aware causal provenance is a useful abstraction for securing tool-calling agent systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies a denial-feedback leakage pattern termed 'causality laundering' in tool-calling LLM agents, where adversaries infer protected information from denial outcomes and exfiltrate it via subsequent benign tool calls. This is not captured by flat provenance tracking because the flow arises from causal influence rather than direct data dependencies. The authors propose the Agentic Reference Monitor (ARM), a runtime layer that consults an augmented provenance graph including field-level data, tool calls, and counterfactual edges from denied actions, then enforces policies over an integrity lattice. In controlled evaluation on three representative scenarios, ARM blocks causality laundering, transitive taint, and mixed-provenance misuse missed by a flat baseline, while incurring sub-millisecond overhead.
Significance. If the counterfactual-edge construction holds in practice, the work introduces a useful extension of provenance tracking to denial-induced causal effects in agentic systems, addressing a gap between static data-flow analysis and runtime LLM behavior. The low-overhead result and explicit comparison to flat provenance strengthen the case for adoption in reference-monitor designs for tool-calling agents.
major comments (3)
- [Evaluation] Evaluation section: the paper states that ARM blocks the attack in three scenarios but provides no description of the algorithm or heuristics used to compute counterfactual edges from denial messages, nor any measurement of missed links or spurious edges when LLM reasoning adapts after receiving feedback.
- [§3] §3 (ARM design): the claim that the runtime can accurately augment the graph with counterfactual edges from denied-action nodes rests on the untested assumption that denial feedback produces predictable, enumerable changes in subsequent tool calls; this is load-bearing for the integrity-lattice enforcement but is only asserted rather than demonstrated against adaptive agents.
- [Abstract and §4] Abstract and §4: the controlled evaluation is described only at high level (three scenarios, sub-millisecond overhead) with no data, code, or verification steps supplied, preventing independent assessment of whether the reported blocking results depend on oracle-level edge inference.
minor comments (2)
- [§3] Notation for the integrity lattice and counterfactual propagation is introduced without a compact formal definition or small example that could be checked independently.
- [Evaluation] The baseline 'flat provenance' implementation is not described in sufficient detail (e.g., which taint rules or graph traversal it uses) to allow exact reproduction of the comparison.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our identification of causality laundering in tool-calling LLM agents and the proposed Agentic Reference Monitor. We address each major comment point by point below, indicating planned revisions to improve clarity, reproducibility, and validation.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the paper states that ARM blocks the attack in three scenarios but provides no description of the algorithm or heuristics used to compute counterfactual edges from denial messages, nor any measurement of missed links or spurious edges when LLM reasoning adapts after receiving feedback.
Authors: We agree that the evaluation would be strengthened by explicit details on counterfactual edge construction. In the revised manuscript we will add a precise description of the algorithm and heuristics used to derive counterfactual edges from denial messages, together with quantitative measurements of missed links and spurious edges under adaptive LLM reasoning. These additions will be placed in an expanded evaluation subsection. revision: yes
-
Referee: [§3] §3 (ARM design): the claim that the runtime can accurately augment the graph with counterfactual edges from denied-action nodes rests on the untested assumption that denial feedback produces predictable, enumerable changes in subsequent tool calls; this is load-bearing for the integrity-lattice enforcement but is only asserted rather than demonstrated against adaptive agents.
Authors: The referee correctly notes that the design assumes predictable causal effects from denial feedback. While our controlled scenarios support this in practice, we acknowledge the assumption requires explicit testing against adaptive agents. We will revise §3 to articulate the assumption more clearly and will add targeted experiments in the evaluation section that measure ARM’s effectiveness when the LLM adapts its subsequent tool calls after receiving denial feedback. revision: yes
-
Referee: [Abstract and §4] Abstract and §4: the controlled evaluation is described only at high level (three scenarios, sub-millisecond overhead) with no data, code, or verification steps supplied, preventing independent assessment of whether the reported blocking results depend on oracle-level edge inference.
Authors: We will expand both the abstract and §4 with additional concrete details on the three scenarios, verification steps, and overhead measurements. Full experimental data and code will be released as supplementary material (or via a public repository) to allow independent reproduction and to clarify that edge inference is performed by the documented heuristics rather than oracle access. revision: partial
Circularity Check
No significant circularity detected in the derivation
full rationale
The paper introduces the causality-laundering attack pattern and the ARM runtime system as a new abstraction that augments provenance graphs with counterfactual edges from denied actions. The central claims rest on explicit definitions of the attack (information flow via causal influence rather than direct data) and the enforcement mechanism, supported by a controlled evaluation on three scenarios. No equations, fitted parameters, or self-citations are invoked to force the result; the proposal does not reduce to renaming known results, self-definitional loops, or load-bearing self-citations. The derivation remains self-contained against the stated assumptions and evaluation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A provenance graph can be constructed over tool calls, returned data, field-level provenance, and denied actions
- domain assumption An integrity lattice can propagate trust through the graph including counterfactual edges from denied-action nodes
invented entities (2)
-
causality laundering
no independent evidence
-
Agentic Reference Monitor (ARM)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
AgentGuardian: Learned access-control policies for LLM agents.arXiv preprint arXiv:2601.10440, 2026
Nadya Abaev, Denis Klimov, Gerard Levinov, David Mimran, Yuval Elovici, and Asaf Shabtai. AgentGuardian: Learned access-control policies for LLM agents.arXiv preprint arXiv:2601.10440, 2026
- [2]
-
[3]
Model context protocol.https://modelcontextprotocol.io, 2024
Anthropic. Model context protocol.https://modelcontextprotocol.io, 2024. Specifica- tion v1.0
work page 2024
-
[4]
Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud
Arnar Birgisson, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud. InProceedings of the 21st Network and Distributed System Security Symposium (NDSS), 2014
work page 2014
-
[5]
Provenance in databases: Why, how, and where.Foundations and Trends in Databases, 1(4):379–474, 2009
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. Provenance in databases: Why, how, and where.Foundations and Trends in Databases, 1(4):379–474, 2009
work page 2009
-
[6]
Securing AI agents with information-flow control,
Manuel Costa, Ahmed Salem, Aashish Kolluri, Boris Kopf, Shruti Tople, Andrew Paverd, Lukas Wutschitz, Mark Russinovich, and Santiago Zanella-Beguelin. Securing AI agents with information-flow control. InIEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2026. arXiv:2505.23643
-
[7]
AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents
Edoardo Debenedetti et al. AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[8]
Defeating Prompt Injections by Design
Edoardo Debenedetti et al. Defeating prompt injection by design: Building secure LLM applications with CaMeL.arXiv preprint arXiv:2503.18813, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Dorothy E. Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976. 21
work page 1976
-
[10]
Jack B. Dennis and Earl C. Van Horn. Programming generality: A fundamental problem of the one-processor computer.Communications of the ACM, 9(3):143–147, 1966
work page 1966
-
[11]
Green, Grigoris Karvounarakis, and Val Tannen
Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. InProceed- ings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 31–40, 2007
work page 2007
-
[12]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023
work page 2023
-
[13]
Defending against indirect prompt injection attacks with spotlighting
Keegan Hines et al. Defending against indirect prompt injection attacks with spotlighting. arXiv preprint arXiv:2403.14720, 2024
-
[14]
Invariant Labs. Tool poisoning attacks in MCP.https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks, 2024
work page 2024
-
[15]
Thomas Jiralerspong, Flemming Kondrup, and Yoshua Bengio. Noticing the watcher: LLM agents can infer CoT monitoring from blocking feedback.arXiv preprint arXiv:2603.16928, 2026
-
[16]
Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Miculicich, Kyomin Jung, Krishna- murthy Dj Dvijotham, Long T. Le, and Tomas Pfister. CausalArmor: Efficient indirect prompt injection guardrails via causal attribution.arXiv preprint arXiv:2602.07918, 2026
-
[17]
Butler W. Lampson. A note on the confinement problem.Communications of the ACM, 16(10):613–615, 1973
work page 1973
-
[18]
PRISM:Zero-forkdefense-in-depthruntimelayerforagentsecurity.arXiv preprint arXiv:2603.11853, 2026
FrankLi. PRISM:Zero-forkdefense-in-depthruntimelayerforagentsecurity.arXiv preprint arXiv:2603.11853, 2026
-
[19]
Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, and Jing Shao. ToolSafe: Step-level pre-execution detection for LLM agent safety.arXiv preprint arXiv:2601.10156, 2026
-
[20]
NVIDIA. NeMo Guardrails: A toolkit for controllable and safe LLM applications.https: //github.com/NVIDIA/NeMo-Guardrails, 2024
work page 2024
-
[21]
Formal Policy Enforcement for Real-World Agentic Systems
Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Mihai Christodor- escu, and Somesh Jha. Policy compiler for secure agentic systems.arXiv preprint arXiv:2602.16708, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[22]
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs.arXiv preprint arXiv:2305.15334, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
Cambridge University Press, 2nd edition, 2009
Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009
work page 2009
-
[24]
rustworkx: A high-performance graph library for Python.https: //github.com/Qiskit/rustworkx, 2024
Qiskit Contributors. rustworkx: A high-performance graph library for Python.https: //github.com/Qiskit/rustworkx, 2024
work page 2024
-
[25]
Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. 22
work page 1975
-
[26]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Ham- bro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[27]
Schwartz, Thanassis Avgerinos, and David Brumley
Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask).Proceedings of the IEEE Symposium on Security and Privacy, pages 317–331, 2010
work page 2010
-
[28]
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace et al. The instruction hierarchy: Training LLMs to prioritize privileged in- structions.arXiv preprint arXiv:2404.13208, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. AgentArmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025
-
[30]
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan et al. InjectAgent: Benchmarking indirect prompt injection in tool-integrated LLM agents.arXiv preprint arXiv:2403.02691, 2024
work page internal anchor Pith review arXiv 2024
-
[31]
Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, and Hongxin Hu. AgentSentry: Mitigating indirect prompt injection in LLM agents via temporal causal diagnostics and context pu- rification.arXiv preprint arXiv:2602.22724, 2026
-
[32]
Titzer, Heather Miller, and Phillip B
Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L. Titzer, Heather Miller, and Phillip B. Gibbons. RTBAS: Defending LLM agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025
-
[33]
OPP: OpenPort protocol for agent-to-tool governance.arXiv preprint arXiv:2602.20196, 2026
Genliang Zhu, Chu Wang, Ziyuan Wang, Zhida Li, and Qiang Li. OPP: OpenPort protocol for agent-to-tool governance.arXiv preprint arXiv:2602.20196, 2026. A Formal Proofs Theorem 1(Mediated Integrity).In an ARM-protected system, there exists no causal path in the provenance graphGfrom a node with trust level below thresholdθto anAllowverdict on a tool call t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.