pith. machine review for the scientific record. sign in

arxiv: 2604.04035 · v1 · submitted 2026-04-05 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

Mohammad Hossein Chinaei

Pith reviewed 2026-05-13 17:16 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords causality launderingdenial-feedback leakagetool-calling LLM agentsprovenance trackingruntime enforcementcausal influenceLLM securityintegrity lattice
0
0 comments X

The pith

Tool-calling LLM agents can leak private data through the outcome of denied actions, which standard provenance tracking misses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an adversary can probe a protected tool call, infer private information from the denial response, and then exfiltrate it through a later benign tool call. This pattern, called causality laundering, arises from causal influence created by the denial rather than direct data movement. Flat provenance systems fail to catch it because they track only explicit data flows and ignore the counterfactual effects of denials. The authors introduce the Agentic Reference Monitor, which builds a provenance graph that includes edges from denied actions and uses an integrity lattice to enforce policies at each tool invocation. Evaluation on three attack scenarios shows the monitor blocks the leaks while keeping policy checks under a millisecond.

Core claim

Causality laundering is the attack in which probing a protected action produces a denial that leaks information, which the adversary then carries forward through a subsequent tool call that appears safe. The Agentic Reference Monitor counters this by maintaining a provenance graph over tool calls, returned data, field-level tags, and denied actions, augmented with counterfactual edges that represent the causal influence of each denial. Trust propagates through an integrity lattice, so any tool call whose inputs carry influence from a denied action can be rejected according to policy.

What carries the argument

The Agentic Reference Monitor (ARM), a runtime mediator that consults a provenance graph augmented with counterfactual edges from denied-action nodes to enforce integrity policies over both data dependencies and denial-induced causal influence.

If this is right

  • ARM blocks causality laundering, transitive taint, and mixed-provenance field misuse that flat provenance tracking misses.
  • Policy checks add only sub-millisecond overhead in the evaluated scenarios.
  • Denial-aware causal provenance becomes a necessary abstraction for any tool-calling agent that must respect action restrictions.
  • Enforcement must occur at every tool invocation rather than only at data return points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same denial-feedback channel could appear in multi-agent workflows where one agent's denial affects another's later decisions.
  • Tool designers would need explicit policies that limit the information content of denial messages themselves.
  • Production deployments would need lightweight ways to approximate the full counterfactual graph without storing every possible denial outcome.

Load-bearing premise

A runtime system can accurately build and consult a provenance graph that includes all counterfactual edges from denied actions without missing causal links or introducing new attack surfaces.

What would settle it

A controlled run of one of the three evaluated attack scenarios in which the Agentic Reference Monitor allows an exfiltration call that carries information inferred solely from a prior denial would show the monitor does not block causality laundering.

Figures

Figures reproduced from arXiv: 2604.04035 by Mohammad Hossein Chinaei.

Figure 1
Figure 1. Figure 1: The ARM layered policy pipeline. Each layer can independently deny a tool call. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
read the original abstract

Tool-calling LLM agents can read private data, invoke external services, and trigger real-world actions, creating a security problem at the point of tool execution. We identify a denial-feedback leakage pattern, which we term causality laundering, in which an adversary probes a protected action, learns from the denial outcome, and exfiltrates the inferred information through a later seemingly benign tool call. This attack is not captured by flat provenance tracking alone because the leaked information arises from causal influence of the denied action, not direct data flow. We present the Agentic Reference Monitor (ARM), a runtime enforcement layer that mediates every tool invocation by consulting a provenance graph over tool calls, returned data, field-level provenance, and denied actions. ARM propagates trust through an integrity lattice and augments the graph with counterfactual edges from denied-action nodes, enabling enforcement over both transitive data dependencies and denial-induced causal influence. In a controlled evaluation on three representative attack scenarios, ARM blocks causality laundering, transitive taint propagation, and mixed-provenance field misuse that a flat provenance baseline misses, while adding sub-millisecond policy evaluation overhead. These results suggest that denial-aware causal provenance is a useful abstraction for securing tool-calling agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper identifies a denial-feedback leakage pattern termed 'causality laundering' in tool-calling LLM agents, where adversaries infer protected information from denial outcomes and exfiltrate it via subsequent benign tool calls. This is not captured by flat provenance tracking because the flow arises from causal influence rather than direct data dependencies. The authors propose the Agentic Reference Monitor (ARM), a runtime layer that consults an augmented provenance graph including field-level data, tool calls, and counterfactual edges from denied actions, then enforces policies over an integrity lattice. In controlled evaluation on three representative scenarios, ARM blocks causality laundering, transitive taint, and mixed-provenance misuse missed by a flat baseline, while incurring sub-millisecond overhead.

Significance. If the counterfactual-edge construction holds in practice, the work introduces a useful extension of provenance tracking to denial-induced causal effects in agentic systems, addressing a gap between static data-flow analysis and runtime LLM behavior. The low-overhead result and explicit comparison to flat provenance strengthen the case for adoption in reference-monitor designs for tool-calling agents.

major comments (3)
  1. [Evaluation] Evaluation section: the paper states that ARM blocks the attack in three scenarios but provides no description of the algorithm or heuristics used to compute counterfactual edges from denial messages, nor any measurement of missed links or spurious edges when LLM reasoning adapts after receiving feedback.
  2. [§3] §3 (ARM design): the claim that the runtime can accurately augment the graph with counterfactual edges from denied-action nodes rests on the untested assumption that denial feedback produces predictable, enumerable changes in subsequent tool calls; this is load-bearing for the integrity-lattice enforcement but is only asserted rather than demonstrated against adaptive agents.
  3. [Abstract and §4] Abstract and §4: the controlled evaluation is described only at high level (three scenarios, sub-millisecond overhead) with no data, code, or verification steps supplied, preventing independent assessment of whether the reported blocking results depend on oracle-level edge inference.
minor comments (2)
  1. [§3] Notation for the integrity lattice and counterfactual propagation is introduced without a compact formal definition or small example that could be checked independently.
  2. [Evaluation] The baseline 'flat provenance' implementation is not described in sufficient detail (e.g., which taint rules or graph traversal it uses) to allow exact reproduction of the comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our identification of causality laundering in tool-calling LLM agents and the proposed Agentic Reference Monitor. We address each major comment point by point below, indicating planned revisions to improve clarity, reproducibility, and validation.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the paper states that ARM blocks the attack in three scenarios but provides no description of the algorithm or heuristics used to compute counterfactual edges from denial messages, nor any measurement of missed links or spurious edges when LLM reasoning adapts after receiving feedback.

    Authors: We agree that the evaluation would be strengthened by explicit details on counterfactual edge construction. In the revised manuscript we will add a precise description of the algorithm and heuristics used to derive counterfactual edges from denial messages, together with quantitative measurements of missed links and spurious edges under adaptive LLM reasoning. These additions will be placed in an expanded evaluation subsection. revision: yes

  2. Referee: [§3] §3 (ARM design): the claim that the runtime can accurately augment the graph with counterfactual edges from denied-action nodes rests on the untested assumption that denial feedback produces predictable, enumerable changes in subsequent tool calls; this is load-bearing for the integrity-lattice enforcement but is only asserted rather than demonstrated against adaptive agents.

    Authors: The referee correctly notes that the design assumes predictable causal effects from denial feedback. While our controlled scenarios support this in practice, we acknowledge the assumption requires explicit testing against adaptive agents. We will revise §3 to articulate the assumption more clearly and will add targeted experiments in the evaluation section that measure ARM’s effectiveness when the LLM adapts its subsequent tool calls after receiving denial feedback. revision: yes

  3. Referee: [Abstract and §4] Abstract and §4: the controlled evaluation is described only at high level (three scenarios, sub-millisecond overhead) with no data, code, or verification steps supplied, preventing independent assessment of whether the reported blocking results depend on oracle-level edge inference.

    Authors: We will expand both the abstract and §4 with additional concrete details on the three scenarios, verification steps, and overhead measurements. Full experimental data and code will be released as supplementary material (or via a public repository) to allow independent reproduction and to clarify that edge inference is performed by the documented heuristics rather than oracle access. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in the derivation

full rationale

The paper introduces the causality-laundering attack pattern and the ARM runtime system as a new abstraction that augments provenance graphs with counterfactual edges from denied actions. The central claims rest on explicit definitions of the attack (information flow via causal influence rather than direct data) and the enforcement mechanism, supported by a controlled evaluation on three scenarios. No equations, fitted parameters, or self-citations are invoked to force the result; the proposal does not reduce to renaming known results, self-definitional loops, or load-bearing self-citations. The derivation remains self-contained against the stated assumptions and evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions about constructing provenance graphs that include denied actions and using an integrity lattice for enforcement.

axioms (2)
  • domain assumption A provenance graph can be constructed over tool calls, returned data, field-level provenance, and denied actions
    Fundamental to the ARM design as described in the abstract.
  • domain assumption An integrity lattice can propagate trust through the graph including counterfactual edges from denied-action nodes
    Core mechanism enabling enforcement over causal influence from denials.
invented entities (2)
  • causality laundering no independent evidence
    purpose: To name and conceptualize the denial-feedback leakage attack
    Newly coined term for the identified pattern.
  • Agentic Reference Monitor (ARM) no independent evidence
    purpose: Runtime enforcement layer mediating tool invocations using the provenance graph
    Proposed new component for securing agents.

pith-pipeline@v0.9.0 · 5506 in / 1366 out tokens · 39825 ms · 2026-05-13T17:16:11.636515+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 5 internal anchors

  1. [1]

    AgentGuardian: Learned access-control policies for LLM agents.arXiv preprint arXiv:2601.10440, 2026

    Nadya Abaev, Denis Klimov, Gerard Levinov, David Mimran, Yuval Elovici, and Asaf Shabtai. AgentGuardian: Learned access-control policies for LLM agents.arXiv preprint arXiv:2601.10440, 2026

  2. [2]

    Anderson

    James P. Anderson. Computer security technology planning study. Technical Report ESD- TR-73-51, Air Force Electronic Systems Division, 1972. Volume II

  3. [3]

    Model context protocol.https://modelcontextprotocol.io, 2024

    Anthropic. Model context protocol.https://modelcontextprotocol.io, 2024. Specifica- tion v1.0

  4. [4]

    Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud

    Arnar Birgisson, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud. InProceedings of the 21st Network and Distributed System Security Symposium (NDSS), 2014

  5. [5]

    Provenance in databases: Why, how, and where.Foundations and Trends in Databases, 1(4):379–474, 2009

    James Cheney, Laura Chiticariu, and Wang-Chiew Tan. Provenance in databases: Why, how, and where.Foundations and Trends in Databases, 1(4):379–474, 2009

  6. [6]

    Securing AI agents with information-flow control,

    Manuel Costa, Ahmed Salem, Aashish Kolluri, Boris Kopf, Shruti Tople, Andrew Paverd, Lukas Wutschitz, Mark Russinovich, and Santiago Zanella-Beguelin. Securing AI agents with information-flow control. InIEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2026. arXiv:2505.23643

  7. [7]

    AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents

    Edoardo Debenedetti et al. AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024

  8. [8]

    Defeating Prompt Injections by Design

    Edoardo Debenedetti et al. Defeating prompt injection by design: Building secure LLM applications with CaMeL.arXiv preprint arXiv:2503.18813, 2025

  9. [9]

    Dorothy E. Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976. 21

  10. [10]

    Dennis and Earl C

    Jack B. Dennis and Earl C. Van Horn. Programming generality: A fundamental problem of the one-processor computer.Communications of the ACM, 9(3):143–147, 1966

  11. [11]

    Green, Grigoris Karvounarakis, and Val Tannen

    Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. InProceed- ings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 31–40, 2007

  12. [12]

    Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023

  13. [13]

    Defending against indirect prompt injection attacks with spotlighting

    Keegan Hines et al. Defending against indirect prompt injection attacks with spotlighting. arXiv preprint arXiv:2403.14720, 2024

  14. [14]

    Tool poisoning attacks in MCP.https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks, 2024

    Invariant Labs. Tool poisoning attacks in MCP.https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks, 2024

  15. [15]

    Noticing the watcher: LLM agents can infer CoT monitoring from blocking feedback.arXiv preprint arXiv:2603.16928, 2026

    Thomas Jiralerspong, Flemming Kondrup, and Yoshua Bengio. Noticing the watcher: LLM agents can infer CoT monitoring from blocking feedback.arXiv preprint arXiv:2603.16928, 2026

  16. [16]

    Le, and Tomas Pfister

    Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Miculicich, Kyomin Jung, Krishna- murthy Dj Dvijotham, Long T. Le, and Tomas Pfister. CausalArmor: Efficient indirect prompt injection guardrails via causal attribution.arXiv preprint arXiv:2602.07918, 2026

  17. [17]

    Butler W. Lampson. A note on the confinement problem.Communications of the ACM, 16(10):613–615, 1973

  18. [18]

    PRISM:Zero-forkdefense-in-depthruntimelayerforagentsecurity.arXiv preprint arXiv:2603.11853, 2026

    FrankLi. PRISM:Zero-forkdefense-in-depthruntimelayerforagentsecurity.arXiv preprint arXiv:2603.11853, 2026

  19. [19]

    ToolSafe: Step-level pre-execution detection for LLM agent safety.arXiv preprint arXiv:2601.10156, 2026

    Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, and Jing Shao. ToolSafe: Step-level pre-execution detection for LLM agent safety.arXiv preprint arXiv:2601.10156, 2026

  20. [20]

    NeMo Guardrails: A toolkit for controllable and safe LLM applications.https: //github.com/NVIDIA/NeMo-Guardrails, 2024

    NVIDIA. NeMo Guardrails: A toolkit for controllable and safe LLM applications.https: //github.com/NVIDIA/NeMo-Guardrails, 2024

  21. [21]

    Formal Policy Enforcement for Real-World Agentic Systems

    Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Mihai Christodor- escu, and Somesh Jha. Policy compiler for secure agentic systems.arXiv preprint arXiv:2602.16708, 2026

  22. [22]

    Gorilla: Large Language Model Connected with Massive APIs

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs.arXiv preprint arXiv:2305.15334, 2023

  23. [23]

    Cambridge University Press, 2nd edition, 2009

    Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

  24. [24]

    rustworkx: A high-performance graph library for Python.https: //github.com/Qiskit/rustworkx, 2024

    Qiskit Contributors. rustworkx: A high-performance graph library for Python.https: //github.com/Qiskit/rustworkx, 2024

  25. [25]

    Saltzer and Michael D

    Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. 22

  26. [26]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Ham- bro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

  27. [27]

    Schwartz, Thanassis Avgerinos, and David Brumley

    Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask).Proceedings of the IEEE Symposium on Security and Privacy, pages 317–331, 2010

  28. [28]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    Eric Wallace et al. The instruction hierarchy: Training LLMs to prioritize privileged in- structions.arXiv preprint arXiv:2404.13208, 2024

  29. [29]

    Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

    Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. AgentArmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

  30. [30]

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    Qiusi Zhan et al. InjectAgent: Benchmarking indirect prompt injection in tool-integrated LLM agents.arXiv preprint arXiv:2403.02691, 2024

  31. [31]

    AgentSentry: Mitigating indirect prompt injection in LLM agents via temporal causal diagnostics and context pu- rification.arXiv preprint arXiv:2602.22724, 2026

    Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, and Hongxin Hu. AgentSentry: Mitigating indirect prompt injection in LLM agents via temporal causal diagnostics and context pu- rification.arXiv preprint arXiv:2602.22724, 2026

  32. [32]

    Titzer, Heather Miller, and Phillip B

    Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L. Titzer, Heather Miller, and Phillip B. Gibbons. RTBAS: Defending LLM agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025

  33. [33]

    OPP: OpenPort protocol for agent-to-tool governance.arXiv preprint arXiv:2602.20196, 2026

    Genliang Zhu, Chu Wang, Ziyuan Wang, Zhida Li, and Qiang Li. OPP: OpenPort protocol for agent-to-tool governance.arXiv preprint arXiv:2602.20196, 2026. A Formal Proofs Theorem 1(Mediated Integrity).In an ARM-protected system, there exists no causal path in the provenance graphGfrom a node with trust level below thresholdθto anAllowverdict on a tool call t...