arxiv: 2604.04035 · v1 · submitted 2026-04-05 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

Mohammad Hossein Chinaei

Pith reviewed 2026-05-13 17:16 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords causality launderingdenial-feedback leakagetool-calling LLM agentsprovenance trackingruntime enforcementcausal influenceLLM securityintegrity lattice

0 comments

The pith

Tool-calling LLM agents can leak private data through the outcome of denied actions, which standard provenance tracking misses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an adversary can probe a protected tool call, infer private information from the denial response, and then exfiltrate it through a later benign tool call. This pattern, called causality laundering, arises from causal influence created by the denial rather than direct data movement. Flat provenance systems fail to catch it because they track only explicit data flows and ignore the counterfactual effects of denials. The authors introduce the Agentic Reference Monitor, which builds a provenance graph that includes edges from denied actions and uses an integrity lattice to enforce policies at each tool invocation. Evaluation on three attack scenarios shows the monitor blocks the leaks while keeping policy checks under a millisecond.

Core claim

Causality laundering is the attack in which probing a protected action produces a denial that leaks information, which the adversary then carries forward through a subsequent tool call that appears safe. The Agentic Reference Monitor counters this by maintaining a provenance graph over tool calls, returned data, field-level tags, and denied actions, augmented with counterfactual edges that represent the causal influence of each denial. Trust propagates through an integrity lattice, so any tool call whose inputs carry influence from a denied action can be rejected according to policy.

What carries the argument

The Agentic Reference Monitor (ARM), a runtime mediator that consults a provenance graph augmented with counterfactual edges from denied-action nodes to enforce integrity policies over both data dependencies and denial-induced causal influence.

If this is right

ARM blocks causality laundering, transitive taint, and mixed-provenance field misuse that flat provenance tracking misses.
Policy checks add only sub-millisecond overhead in the evaluated scenarios.
Denial-aware causal provenance becomes a necessary abstraction for any tool-calling agent that must respect action restrictions.
Enforcement must occur at every tool invocation rather than only at data return points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same denial-feedback channel could appear in multi-agent workflows where one agent's denial affects another's later decisions.
Tool designers would need explicit policies that limit the information content of denial messages themselves.
Production deployments would need lightweight ways to approximate the full counterfactual graph without storing every possible denial outcome.

Load-bearing premise

A runtime system can accurately build and consult a provenance graph that includes all counterfactual edges from denied actions without missing causal links or introducing new attack surfaces.

What would settle it

A controlled run of one of the three evaluated attack scenarios in which the Agentic Reference Monitor allows an exfiltration call that carries information inferred solely from a prior denial would show the monitor does not block causality laundering.

Figures

Figures reproduced from arXiv: 2604.04035 by Mohammad Hossein Chinaei.

read the original abstract

Tool-calling LLM agents can read private data, invoke external services, and trigger real-world actions, creating a security problem at the point of tool execution. We identify a denial-feedback leakage pattern, which we term causality laundering, in which an adversary probes a protected action, learns from the denial outcome, and exfiltrates the inferred information through a later seemingly benign tool call. This attack is not captured by flat provenance tracking alone because the leaked information arises from causal influence of the denied action, not direct data flow. We present the Agentic Reference Monitor (ARM), a runtime enforcement layer that mediates every tool invocation by consulting a provenance graph over tool calls, returned data, field-level provenance, and denied actions. ARM propagates trust through an integrity lattice and augments the graph with counterfactual edges from denied-action nodes, enabling enforcement over both transitive data dependencies and denial-induced causal influence. In a controlled evaluation on three representative attack scenarios, ARM blocks causality laundering, transitive taint propagation, and mixed-provenance field misuse that a flat provenance baseline misses, while adding sub-millisecond policy evaluation overhead. These results suggest that denial-aware causal provenance is a useful abstraction for securing tool-calling agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a real leakage path through denial feedback in tool-calling agents that flat provenance misses, but the fix rests on an untested way to build counterfactual edges at runtime.

read the letter

The main thing to know is that an adversary can probe a protected tool call, get denied, and then exfiltrate what it learned by triggering a later call whose content or existence was shaped by that denial. Flat provenance tracking does not catch this because no data actually flows from the denied action. The paper names the pattern causality laundering and offers the Agentic Reference Monitor as a runtime layer that keeps a provenance graph over calls, returned values, field tags, and denials, then adds counterfactual edges from denied nodes and enforces over an integrity lattice. In three controlled scenarios it stops the laundering attack plus ordinary transitive taint and mixed-field misuse, with sub-millisecond policy checks. That is the concrete contribution: a denial-aware extension of provenance that treats causal influence from refusals as first-class. The evaluation is narrow but at least shows the idea can be implemented without obvious performance cost. The soft spot is exactly where the stress test points. Building those counterfactual edges requires the monitor to decide, for every subsequent call, whether it would have happened or differed had the denied action been allowed. LLM reasoning after a denial message is opaque and context-sensitive, so the mapping is not a simple data dependency. The paper gives no description of how the edges are computed, no measurement of missed or spurious links, and no test against agents that adapt their strategy once they receive a denial. If the inference is incomplete, either leaks remain or useful calls get blocked. The work is aimed at researchers and engineers building enforcement for tool-using agents that touch private data or external services. It deserves a serious referee because the gap it identifies is practical and the proposed abstraction is a reasonable starting point, even though the edge-construction mechanism needs more validation and testing before anyone would deploy it.

Referee Report

3 major / 2 minor

Summary. The paper identifies a denial-feedback leakage pattern termed 'causality laundering' in tool-calling LLM agents, where adversaries infer protected information from denial outcomes and exfiltrate it via subsequent benign tool calls. This is not captured by flat provenance tracking because the flow arises from causal influence rather than direct data dependencies. The authors propose the Agentic Reference Monitor (ARM), a runtime layer that consults an augmented provenance graph including field-level data, tool calls, and counterfactual edges from denied actions, then enforces policies over an integrity lattice. In controlled evaluation on three representative scenarios, ARM blocks causality laundering, transitive taint, and mixed-provenance misuse missed by a flat baseline, while incurring sub-millisecond overhead.

Significance. If the counterfactual-edge construction holds in practice, the work introduces a useful extension of provenance tracking to denial-induced causal effects in agentic systems, addressing a gap between static data-flow analysis and runtime LLM behavior. The low-overhead result and explicit comparison to flat provenance strengthen the case for adoption in reference-monitor designs for tool-calling agents.

major comments (3)

[Evaluation] Evaluation section: the paper states that ARM blocks the attack in three scenarios but provides no description of the algorithm or heuristics used to compute counterfactual edges from denial messages, nor any measurement of missed links or spurious edges when LLM reasoning adapts after receiving feedback.
[§3] §3 (ARM design): the claim that the runtime can accurately augment the graph with counterfactual edges from denied-action nodes rests on the untested assumption that denial feedback produces predictable, enumerable changes in subsequent tool calls; this is load-bearing for the integrity-lattice enforcement but is only asserted rather than demonstrated against adaptive agents.
[Abstract and §4] Abstract and §4: the controlled evaluation is described only at high level (three scenarios, sub-millisecond overhead) with no data, code, or verification steps supplied, preventing independent assessment of whether the reported blocking results depend on oracle-level edge inference.

minor comments (2)

[§3] Notation for the integrity lattice and counterfactual propagation is introduced without a compact formal definition or small example that could be checked independently.
[Evaluation] The baseline 'flat provenance' implementation is not described in sufficient detail (e.g., which taint rules or graph traversal it uses) to allow exact reproduction of the comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our identification of causality laundering in tool-calling LLM agents and the proposed Agentic Reference Monitor. We address each major comment point by point below, indicating planned revisions to improve clarity, reproducibility, and validation.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the paper states that ARM blocks the attack in three scenarios but provides no description of the algorithm or heuristics used to compute counterfactual edges from denial messages, nor any measurement of missed links or spurious edges when LLM reasoning adapts after receiving feedback.

Authors: We agree that the evaluation would be strengthened by explicit details on counterfactual edge construction. In the revised manuscript we will add a precise description of the algorithm and heuristics used to derive counterfactual edges from denial messages, together with quantitative measurements of missed links and spurious edges under adaptive LLM reasoning. These additions will be placed in an expanded evaluation subsection. revision: yes
Referee: [§3] §3 (ARM design): the claim that the runtime can accurately augment the graph with counterfactual edges from denied-action nodes rests on the untested assumption that denial feedback produces predictable, enumerable changes in subsequent tool calls; this is load-bearing for the integrity-lattice enforcement but is only asserted rather than demonstrated against adaptive agents.

Authors: The referee correctly notes that the design assumes predictable causal effects from denial feedback. While our controlled scenarios support this in practice, we acknowledge the assumption requires explicit testing against adaptive agents. We will revise §3 to articulate the assumption more clearly and will add targeted experiments in the evaluation section that measure ARM’s effectiveness when the LLM adapts its subsequent tool calls after receiving denial feedback. revision: yes
Referee: [Abstract and §4] Abstract and §4: the controlled evaluation is described only at high level (three scenarios, sub-millisecond overhead) with no data, code, or verification steps supplied, preventing independent assessment of whether the reported blocking results depend on oracle-level edge inference.

Authors: We will expand both the abstract and §4 with additional concrete details on the three scenarios, verification steps, and overhead measurements. Full experimental data and code will be released as supplementary material (or via a public repository) to allow independent reproduction and to clarify that edge inference is performed by the documented heuristics rather than oracle access. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in the derivation

full rationale

The paper introduces the causality-laundering attack pattern and the ARM runtime system as a new abstraction that augments provenance graphs with counterfactual edges from denied actions. The central claims rest on explicit definitions of the attack (information flow via causal influence rather than direct data) and the enforcement mechanism, supported by a controlled evaluation on three scenarios. No equations, fitted parameters, or self-citations are invoked to force the result; the proposal does not reduce to renaming known results, self-definitional loops, or load-bearing self-citations. The derivation remains self-contained against the stated assumptions and evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions about constructing provenance graphs that include denied actions and using an integrity lattice for enforcement.

axioms (2)

domain assumption A provenance graph can be constructed over tool calls, returned data, field-level provenance, and denied actions
Fundamental to the ARM design as described in the abstract.
domain assumption An integrity lattice can propagate trust through the graph including counterfactual edges from denied-action nodes
Core mechanism enabling enforcement over causal influence from denials.

invented entities (2)

causality laundering no independent evidence
purpose: To name and conceptualize the denial-feedback leakage attack
Newly coined term for the identified pattern.
Agentic Reference Monitor (ARM) no independent evidence
purpose: Runtime enforcement layer mediating tool invocations using the provenance graph
Proposed new component for securing agents.

pith-pipeline@v0.9.0 · 5506 in / 1366 out tokens · 39825 ms · 2026-05-13T17:16:11.636515+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 5 internal anchors

[1]

AgentGuardian: Learned access-control policies for LLM agents.arXiv preprint arXiv:2601.10440, 2026

Nadya Abaev, Denis Klimov, Gerard Levinov, David Mimran, Yuval Elovici, and Asaf Shabtai. AgentGuardian: Learned access-control policies for LLM agents.arXiv preprint arXiv:2601.10440, 2026

work page arXiv 2026
[2]

Anderson

James P. Anderson. Computer security technology planning study. Technical Report ESD- TR-73-51, Air Force Electronic Systems Division, 1972. Volume II

work page 1972
[3]

Model context protocol.https://modelcontextprotocol.io, 2024

Anthropic. Model context protocol.https://modelcontextprotocol.io, 2024. Specifica- tion v1.0

work page 2024
[4]

Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud

Arnar Birgisson, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud. InProceedings of the 21st Network and Distributed System Security Symposium (NDSS), 2014

work page 2014
[5]

Provenance in databases: Why, how, and where.Foundations and Trends in Databases, 1(4):379–474, 2009

James Cheney, Laura Chiticariu, and Wang-Chiew Tan. Provenance in databases: Why, how, and where.Foundations and Trends in Databases, 1(4):379–474, 2009

work page 2009
[6]

Securing AI agents with information-flow control,

Manuel Costa, Ahmed Salem, Aashish Kolluri, Boris Kopf, Shruti Tople, Andrew Paverd, Lukas Wutschitz, Mark Russinovich, and Santiago Zanella-Beguelin. Securing AI agents with information-flow control. InIEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2026. arXiv:2505.23643

work page arXiv 2026
[7]

AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents

Edoardo Debenedetti et al. AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[8]

Defeating Prompt Injections by Design

Edoardo Debenedetti et al. Defeating prompt injection by design: Building secure LLM applications with CaMeL.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Dorothy E. Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976. 21

work page 1976
[10]

Dennis and Earl C

Jack B. Dennis and Earl C. Van Horn. Programming generality: A fundamental problem of the one-processor computer.Communications of the ACM, 9(3):143–147, 1966

work page 1966
[11]

Green, Grigoris Karvounarakis, and Val Tannen

Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. InProceed- ings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 31–40, 2007

work page 2007
[12]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023

work page 2023
[13]

Defending against indirect prompt injection attacks with spotlighting

Keegan Hines et al. Defending against indirect prompt injection attacks with spotlighting. arXiv preprint arXiv:2403.14720, 2024

work page arXiv 2024
[14]

Tool poisoning attacks in MCP.https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks, 2024

Invariant Labs. Tool poisoning attacks in MCP.https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks, 2024

work page 2024
[15]

Noticing the watcher: LLM agents can infer CoT monitoring from blocking feedback.arXiv preprint arXiv:2603.16928, 2026

Thomas Jiralerspong, Flemming Kondrup, and Yoshua Bengio. Noticing the watcher: LLM agents can infer CoT monitoring from blocking feedback.arXiv preprint arXiv:2603.16928, 2026

work page arXiv 2026
[16]

Le, and Tomas Pfister

Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Miculicich, Kyomin Jung, Krishna- murthy Dj Dvijotham, Long T. Le, and Tomas Pfister. CausalArmor: Efficient indirect prompt injection guardrails via causal attribution.arXiv preprint arXiv:2602.07918, 2026

work page arXiv 2026
[17]

Butler W. Lampson. A note on the confinement problem.Communications of the ACM, 16(10):613–615, 1973

work page 1973
[18]

PRISM:Zero-forkdefense-in-depthruntimelayerforagentsecurity.arXiv preprint arXiv:2603.11853, 2026

FrankLi. PRISM:Zero-forkdefense-in-depthruntimelayerforagentsecurity.arXiv preprint arXiv:2603.11853, 2026

work page arXiv 2026
[19]

ToolSafe: Step-level pre-execution detection for LLM agent safety.arXiv preprint arXiv:2601.10156, 2026

Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, and Jing Shao. ToolSafe: Step-level pre-execution detection for LLM agent safety.arXiv preprint arXiv:2601.10156, 2026

work page arXiv 2026
[20]

NeMo Guardrails: A toolkit for controllable and safe LLM applications.https: //github.com/NVIDIA/NeMo-Guardrails, 2024

NVIDIA. NeMo Guardrails: A toolkit for controllable and safe LLM applications.https: //github.com/NVIDIA/NeMo-Guardrails, 2024

work page 2024
[21]

Formal Policy Enforcement for Real-World Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Mihai Christodor- escu, and Somesh Jha. Policy compiler for secure agentic systems.arXiv preprint arXiv:2602.16708, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

Gorilla: Large Language Model Connected with Massive APIs

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs.arXiv preprint arXiv:2305.15334, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

Cambridge University Press, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

work page 2009
[24]

rustworkx: A high-performance graph library for Python.https: //github.com/Qiskit/rustworkx, 2024

Qiskit Contributors. rustworkx: A high-performance graph library for Python.https: //github.com/Qiskit/rustworkx, 2024

work page 2024
[25]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. 22

work page 1975
[26]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Ham- bro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[27]

Schwartz, Thanassis Avgerinos, and David Brumley

Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask).Proceedings of the IEEE Symposium on Security and Privacy, pages 317–331, 2010

work page 2010
[28]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace et al. The instruction hierarchy: Training LLMs to prioritize privileged in- structions.arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. AgentArmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

work page arXiv 2025
[30]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Qiusi Zhan et al. InjectAgent: Benchmarking indirect prompt injection in tool-integrated LLM agents.arXiv preprint arXiv:2403.02691, 2024

work page internal anchor Pith review arXiv 2024
[31]

AgentSentry: Mitigating indirect prompt injection in LLM agents via temporal causal diagnostics and context pu- rification.arXiv preprint arXiv:2602.22724, 2026

Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, and Hongxin Hu. AgentSentry: Mitigating indirect prompt injection in LLM agents via temporal causal diagnostics and context pu- rification.arXiv preprint arXiv:2602.22724, 2026

work page arXiv 2026
[32]

Titzer, Heather Miller, and Phillip B

Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L. Titzer, Heather Miller, and Phillip B. Gibbons. RTBAS: Defending LLM agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025

work page arXiv 2025
[33]

OPP: OpenPort protocol for agent-to-tool governance.arXiv preprint arXiv:2602.20196, 2026

Genliang Zhu, Chu Wang, Ziyuan Wang, Zhida Li, and Qiang Li. OPP: OpenPort protocol for agent-to-tool governance.arXiv preprint arXiv:2602.20196, 2026. A Formal Proofs Theorem 1(Mediated Integrity).In an ARM-protected system, there exists no causal path in the provenance graphGfrom a node with trust level below thresholdθto anAllowverdict on a tool call t...

work page arXiv 2026