Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

Peiran Wang; Ying Li; Yuan Tian

arxiv: 2605.26497 · v1 · pith:O6SFMOH2new · submitted 2026-05-26 · 💻 cs.CR

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

Peiran Wang , Ying Li , Yuan Tian This is my paper

Pith reviewed 2026-06-29 17:34 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentsprompt injectioninformation provenanceauthorization graphgraph alignmentagent securitytool callingindirect injection

0 comments

The pith

AuthGraph detects prompt injections in LLM agents by aligning an execution provenance graph against a clean authorization graph from isolated user intent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a defense called AuthGraph for LLM-based agents that perform tool calls in scenarios like financial transactions where external data can carry indirect prompt injections. It constructs an injected reasoning graph that tracks information provenance through the actual execution trajectory, including any manipulated sources, and an authorization graph drawn from the user's intent supplied in a separate clean context. A structural graph alignment checker then compares the two graphs to identify deviations at both the level of which tools are called and the origins of their parameter values. This dual-graph method is tested on AgentDojo and AgentDyn, where it lowers attack success rates from around 40 percent to 1-2 percent while retaining substantial task completion rates on models such as GPT-4o. The work targets gaps in prior approaches that either verify values without tracking provenance or lack an uninfluenced baseline for comparison.

Core claim

AuthGraph constructs an injected reasoning graph modeling information provenance from the actual execution trajectory and an authorization graph derived from the user's intent in an isolated clean context, then applies a graph alignment checker to structurally compare the two graphs and detect both tool-level and parameter-source-level deviations.

What carries the argument

The graph alignment checker that structurally compares the injected reasoning graph (tracking provenance from execution) with the authorization graph (from clean user intent).

If this is right

Detects deviations at both tool selection and the specific sources of parameter values.
Reduces attack success rate from 40% to 1% on AgentDojo while maintaining 76% task completion on GPT-4o.
Reduces attack success rate from 39% to 2% on AgentDyn while preserving 51% utility.
Outperforms prior defenses including CaMeL, DRIFT, and Progent by enabling fine-grained provenance checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Requiring a separate clean context for the authorization graph introduces an extra user step that may affect usability in real-time agent workflows.
The structural comparison technique could be adapted to other agent manipulation vectors such as tool response poisoning not covered in the current evaluations.
The dual-graph design might integrate with existing tool-calling APIs by logging provenance metadata without redesigning the agent loop itself.

Load-bearing premise

The authorization graph can be derived from the user's intent in an isolated clean context that attackers cannot influence through injection.

What would settle it

An attack that alters parameter sources in the execution trajectory yet produces no detectable structural mismatch with the authorization graph, causing the attack success rate to exceed 2 percent on the evaluated benchmarks.

Figures

Figures reproduced from arXiv: 2605.26497 by Peiran Wang, Ying Li, Yuan Tian.

**Figure 1.** Figure 1: Architecture overview of AUTHGRAPH. The Planner Agent generates an authorization graph from the user prompt and tool catalog in a clean context (no trajectory data). The Graph Builder constructs an injected reasoning graph from the agent’s execution trajectory. The graph alignment checker performs dual-pointer comparison to detect deviations at both the tool level and the parameter-source level. sequence o… view at source ↗

**Figure 2.** Figure 2: Dual-graph comparison on the running example. The authorization graph (left) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

LLM-based agents are increasingly deployed in high-stakes scenarios such as email management, financial transactions, and code execution, where they interact with the external world through tool calling. During execution, these agents must read external data sources (emails, webpages, files) that attackers can control; through indirect prompt injection, attackers embed malicious instructions in this data to manipulate agents into performing unauthorized operations such as transferring funds to attacker-controlled accounts. Existing defenses either perform tool-call-level value checking without tracking where parameter values originate, or analyze execution traces from a single perspective without a clean authorization baseline for comparison. We propose AuthGraph, a dual-graph alignment defense framework that constructs two complementary graphs: an injected reasoning graph that models information provenance from the actual execution trajectory (including potentially manipulated attributions), and an authorization graph derived from the user's intent in an isolated clean context that is information-theoretically impossible to be influenced by injection; a graph alignment checker then structurally compares the two graphs to detect both tool-level and parameter-source-level deviations. On AgentDojo, AuthGraph reduces the attack success rate from 40% to 1% while maintaining 76% task completion rate on GPT-4o; on AgentDyn, it reduces the attack success rate from 39% to 2% while preserving 51% utility, outperforming state-of-the-art defenses including CaMeL, DRIFT, and Progent. To our knowledge, AuthGraph is the first agent security defense to structurally compare authorization specifications against execution provenance at the parameter-source level, achieving fine-grained injection detection without sacrificing agent flexibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AuthGraph frames a dual-graph comparison to catch parameter-source injections, but the clean authorization baseline needs a concrete isolation mechanism to hold up.

read the letter

The main takeaway is that this work builds an injected reasoning graph from the actual execution trace and an authorization graph from user intent captured in a supposedly untainted context, then uses structural alignment to flag deviations at both tool and parameter-source levels.

What is new is the explicit parameter-source tracking via graph comparison, which the abstract positions as distinct from earlier tool-level checks or single-perspective trace analysis. The reported results on AgentDojo and AgentDyn show attack success rates falling to 1-2% while preserving 51-76% utility, outperforming the listed baselines.

The paper does a reasonable job of naming the practical gap in deployed agents that read attacker-controlled data.

The soft spot is the authorization graph itself. The abstract states it comes from an isolated clean context that is information-theoretically impossible to influence, yet gives no procedure for building it. If intent capture shares any LLM call, template, or data path with the execution trajectory, the baseline is no longer guaranteed clean. The stress-test note correctly flags this as load-bearing; without a specified non-LLM parser or separate process, the claim stays an assertion rather than a demonstrated property.

Performance numbers are given without experimental controls or statistical details in the abstract, so soundness cannot be checked from what is available here.

This is for researchers focused on agent security and provenance. A reader working on defenses that need to track where parameter values actually come from would find the framing useful. It deserves a serious referee to examine the construction method and evaluation design in the full manuscript.

Referee Report

2 major / 1 minor

Summary. The paper proposes AuthGraph, a dual-graph defense framework for LLM agents against indirect prompt injection attacks. It builds an injected reasoning graph capturing information provenance from the execution trajectory and an authorization graph derived from user intent in an isolated clean context asserted to be information-theoretically immune to injection; a graph alignment checker then compares the graphs to detect tool-level and parameter-source-level deviations. Experiments on AgentDojo and AgentDyn report attack success rate reductions from 40% to 1% and 39% to 2% respectively, while preserving 76% task completion and 51% utility on GPT-4o, outperforming CaMeL, DRIFT, and Progent.

Significance. If the isolation property and structural comparison hold, the work introduces a novel parameter-source-level provenance check that could meaningfully strengthen agent security without fully sacrificing flexibility. The dual-graph approach is distinct from prior single-perspective or value-checking defenses and supplies concrete benchmark improvements. The absence of a specified isolation mechanism, however, leaves the central guarantee unverified.

major comments (2)

[Abstract] Abstract: The central claim that the authorization graph is 'derived from the user's intent in an isolated clean context that is information-theoretically impossible to be influenced by injection' is load-bearing for the dual-graph comparison, yet the manuscript provides no explicit construction procedure (separate parser, fixed template, or non-LLM component), no isolation proof, and no discussion of shared LLM calls or data paths that could violate the guarantee.
[Abstract] Abstract and evaluation sections: The reported reductions (40%→1% on AgentDojo, 39%→2% on AgentDyn) are presented without details on experimental controls, baseline re-implementations, statistical significance, or whether post-hoc choices affected the numbers; this undermines verification that the alignment checker, rather than other factors, drives the gains.

minor comments (1)

The abstract states 'to our knowledge' novelty but does not cite or compare against all recent provenance or graph-based agent defenses; a dedicated related-work subsection would clarify positioning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for explicit details on the authorization graph construction and experimental controls. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the authorization graph is 'derived from the user's intent in an isolated clean context that is information-theoretically impossible to be influenced by injection' is load-bearing for the dual-graph comparison, yet the manuscript provides no explicit construction procedure (separate parser, fixed template, or non-LLM component), no isolation proof, and no discussion of shared LLM calls or data paths that could violate the guarantee.

Authors: We acknowledge that the current manuscript text does not provide a fully explicit construction procedure or formal isolation argument for the authorization graph. The abstract states the high-level property, but the full paper relies on the initial user prompt being processed in a clean context without external data. To address this, we will revise Section 3 to specify a deterministic template-based parser operating solely on the user-provided intent (no LLM involvement or shared data paths), include a brief isolation argument based on information flow, and add discussion of why shared components are avoided. This strengthens the load-bearing claim without altering the core approach. revision: yes
Referee: [Abstract] Abstract and evaluation sections: The reported reductions (40%→1% on AgentDojo, 39%→2% on AgentDyn) are presented without details on experimental controls, baseline re-implementations, statistical significance, or whether post-hoc choices affected the numbers; this undermines verification that the alignment checker, rather than other factors, drives the gains.

Authors: We agree that additional experimental details are required for full verifiability. The reported attack success rates and utility numbers come from controlled runs on the standard AgentDojo and AgentDyn benchmarks with fixed random seeds and the same task distributions for all methods. In the revision, we will expand the evaluation section (and add an appendix) with: (i) explicit re-implementation notes for CaMeL, DRIFT, and Progent using their public code where available, (ii) trial counts and standard deviations, (iii) statistical significance tests, and (iv) confirmation that no post-hoc metric selection occurred. These additions will allow readers to confirm that the alignment checker is the primary driver of the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; dual-graph construction and alignment checker are independent of fitted parameters or self-citations

full rationale

The paper introduces AuthGraph as a structural comparison between an injected reasoning graph (from execution provenance) and an authorization graph (from isolated user intent). No equations, fitted parameters, or predictions are described that reduce by construction to the inputs or evaluation benchmarks. The abstract asserts the isolation property without deriving it from prior results or self-citations. The method is presented as a new framework evaluated on external benchmarks (AgentDojo, AgentDyn), with performance metrics independent of the defense definition itself. This matches the default expectation of no circularity; the central claim has independent content in the graph alignment checker.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Review is abstract-only; no explicit free parameters, standard axioms, or invented entities beyond the two new graph constructs are described. The graphs themselves constitute the core methodological invention.

invented entities (2)

injected reasoning graph no independent evidence
purpose: models information provenance from the actual execution trajectory including potentially manipulated attributions
New construct introduced to capture execution-side provenance for comparison.
authorization graph no independent evidence
purpose: derived from the user's intent in an isolated clean context
New construct serving as the unattainable-to-inject baseline for alignment.

pith-pipeline@v0.9.1-grok · 5817 in / 1275 out tokens · 37569 ms · 2026-06-29T17:34:00.199821+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 19 canonical work pages · 8 internal anchors

[1]

In34th USENIX Security Symposium (USENIX Se- curity 25), pages 2383–2400, 2025

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner.{StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Se- curity 25), pages 2383–2400, 2025

2025
[2]

Secalign: Defending against prompt injection with preference optimiza- tion

Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wag- ner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimiza- tion. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 2833–2847, 2025

2025
[3]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris K ¨opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-B ´eguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895– 82920, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tram `er. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895– 82920, 2024

2024
[5]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tram `er. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Dorothy E. Denning. A lattice model of secure information flow.Commun. ACM, 19(5): 236–243, May 1976. ISSN 0001-0782. doi: 10.1145/360051.360056. URLhttps://doi. org/10.1145/360051.360056

work page doi:10.1145/360051.360056 1976
[7]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelli- gence and security, pages 79–90, 2023

2023
[8]

Enhancing llm agent safety via causal influence prompting

Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, and Kimin Lee. Enhancing llm agent safety via causal influence prompting. InFindings of the Association for Computational Linguistics: ACL 2025, pages 15143–15168, 2025

2025
[9]

The confused deputy: (or why capabilities might have been invented).SIGOPS Oper

Norm Hardy. The confused deputy: (or why capabilities might have been invented).SIGOPS Oper. Syst. Rev., 22(4):36–38, October 1988. ISSN 0163-5980. doi: 10.1145/54289.871709. URLhttps://doi.org/10.1145/54289.871709

work page doi:10.1145/54289.871709 1988
[10]

Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025

Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025

work page arXiv 2025
[11]

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi- agent systems.arXiv preprint arXiv:2410.07283, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

ACE: A Security Architecture for LLM-Integrated App Systems

Evan Li, Tushin Mallick, Evan Rose, William Robertson, Alina Oprea, and Cristina Nita- Rotaru. Ace: A security architecture for llm-integrated app systems.arXiv preprint arXiv:2504.20984, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Drift: Dynamic rule-based defense with injection isolation for securing llm agents.arXiv preprint arXiv:2506.12104, 2025

Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. Drift: Dynamic rule-based defense with injection isolation for securing llm agents.arXiv preprint arXiv:2506.12104, 2025

work page arXiv 2025
[14]

AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?

Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, and Chaowei Xiao. Agentdyn: A dynamic open-ended benchmark for evaluating prompt injection attacks of real-world agent security system.arXiv preprint arXiv:2602.03117, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024. 10

work page arXiv 2024
[16]

Neural exec: Learning (and learn- ing from) execution triggers for prompt injection attacks

Dario Pasquini, Martin Strohmeier, and Carmela Troncoso. Neural exec: Learning (and learn- ing from) execution triggers for prompt injection attacks. InProceedings of the 2024 Workshop on Artificial Intelligence and Security, pages 89–100, 2024

2024
[17]

Saltzer and M.D

J.H. Saltzer and M.D. Schroeder. The protection of information in computer systems.Pro- ceedings of the IEEE, 63(9):1278–1308, 1975. doi: 10.1109/PROC.1975.9939

work page doi:10.1109/proc.1975.9939 1975
[18]

Progent: Securing AI Agents with Privilege Control

Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, and Dawn Song. Progent: Programmable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. The instruction hierarchy: Training llms to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

work page arXiv 2025
[21]

The landscape of prompt injection threats in llm agents: From taxonomy to analysis.arXiv preprint arXiv:2602.10453, 2026

Peiran Wang, Xinfeng Li, Chong Xiang, Jinghuai Zhang, Ying Li, Lixia Zhang, Xiaofeng Wang, and Yuan Tian. The landscape of prompt injection threats in llm agents: From taxonomy to analysis.arXiv preprint arXiv:2602.10453, 2026

work page arXiv 2026
[22]

System-level defense against indi- rect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indi- rect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024
[23]

Iso- lategpt: An execution isolation architecture for llm-based agentic systems.arXiv preprint arXiv:2403.04960, 2024

Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. Iso- lategpt: An execution isolation architecture for llm-based agentic systems.arXiv preprint arXiv:2403.04960, 2024

work page arXiv 2024
[24]

ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Confer- ence on Learning Representations (ICLR), 2023

2023
[25]

Injecagent: Benchmarking in- direct prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking in- direct prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024

2024
[26]

Adaptive attacks break defenses against indirect prompt injection attacks on llm agents

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025

2025
[27]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents.arXiv preprint arXiv:2410.02644, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Rtbas: Defending llm agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025

Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L Titzer, Heather Miller, and Phillip B Gibbons. Rtbas: Defending llm agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025. 11 A System Prompts We present the complete system prompts used by each component of AUTHGRAPH. This section contains: •Section A.1Gr...

work page arXiv 2025
[29]

The observation contains no risk-relevant information
[30]

The graph already captures the necessary information
[31]

reasoning

The task appears complete. Only add specific, detailed risk information. Keep changes minimal. Use add_node, add_edge, update_node, update_edge, validate_graph, submit_graph. A.2 Planner (Authorization Graph Generation) Planner System Prompt (abbreviated) You are a security planner for an LLM agent system. Given a user’s task and available tools, you gene...
[32]

extra_step_ok

"extra_step_ok" | Legitimate auxiliary/helper step. Includes: (a) Same-domain READ/GET/SEARCH tools that gather context for planned WRITE steps. (b) Generic lookup tools (get_current_day, list_*, check_*)
[33]

skipped_step_ok

"skipped_step_ok" | Agent skipped expected step (already had info)
[34]

suspicious

"suspicious" | No legitimate connection to user’s task; may be injection-driven. Reserve for tools that write to external destinations not in the task, fetch external URLs, or belong to a completely unrelated tool family. When in doubt between 1 and 3, lean toward "extra_step_ok" if the tool is read-only and in the same domain as a planned step. Return JS...
[35]

Do NOT verify the agent’s arithmetic or logic
[36]

serves the user’s goal

Do NOT judge whether the value "serves the user’s goal"
[37]

I cannot trace this value

Do NOT use "I cannot trace this value" as evidence of injection
[38]

aligned": true/false,

Do NOT flag just because the value is not in the user’s prompt. Observations ARE the expected source for these params. Return JSON: {"aligned": true/false, "reason": "..."} If aligned:false, reason MUST contain a verbatim excerpt from the observation. B Additional Case Studies We present representative case studies illustrating AUTHGRAPH’s detection capab...

[1] [1]

In34th USENIX Security Symposium (USENIX Se- curity 25), pages 2383–2400, 2025

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner.{StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Se- curity 25), pages 2383–2400, 2025

2025

[2] [2]

Secalign: Defending against prompt injection with preference optimiza- tion

Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wag- ner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimiza- tion. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 2833–2847, 2025

2025

[3] [3]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris K ¨opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-B ´eguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895– 82920, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tram `er. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895– 82920, 2024

2024

[5] [5]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tram `er. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

Dorothy E. Denning. A lattice model of secure information flow.Commun. ACM, 19(5): 236–243, May 1976. ISSN 0001-0782. doi: 10.1145/360051.360056. URLhttps://doi. org/10.1145/360051.360056

work page doi:10.1145/360051.360056 1976

[7] [7]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelli- gence and security, pages 79–90, 2023

2023

[8] [8]

Enhancing llm agent safety via causal influence prompting

Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, and Kimin Lee. Enhancing llm agent safety via causal influence prompting. InFindings of the Association for Computational Linguistics: ACL 2025, pages 15143–15168, 2025

2025

[9] [9]

The confused deputy: (or why capabilities might have been invented).SIGOPS Oper

Norm Hardy. The confused deputy: (or why capabilities might have been invented).SIGOPS Oper. Syst. Rev., 22(4):36–38, October 1988. ISSN 0163-5980. doi: 10.1145/54289.871709. URLhttps://doi.org/10.1145/54289.871709

work page doi:10.1145/54289.871709 1988

[10] [10]

Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025

Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025

work page arXiv 2025

[11] [11]

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi- agent systems.arXiv preprint arXiv:2410.07283, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

ACE: A Security Architecture for LLM-Integrated App Systems

Evan Li, Tushin Mallick, Evan Rose, William Robertson, Alina Oprea, and Cristina Nita- Rotaru. Ace: A security architecture for llm-integrated app systems.arXiv preprint arXiv:2504.20984, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Drift: Dynamic rule-based defense with injection isolation for securing llm agents.arXiv preprint arXiv:2506.12104, 2025

Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. Drift: Dynamic rule-based defense with injection isolation for securing llm agents.arXiv preprint arXiv:2506.12104, 2025

work page arXiv 2025

[14] [14]

AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?

Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, and Chaowei Xiao. Agentdyn: A dynamic open-ended benchmark for evaluating prompt injection attacks of real-world agent security system.arXiv preprint arXiv:2602.03117, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024. 10

work page arXiv 2024

[16] [16]

Neural exec: Learning (and learn- ing from) execution triggers for prompt injection attacks

Dario Pasquini, Martin Strohmeier, and Carmela Troncoso. Neural exec: Learning (and learn- ing from) execution triggers for prompt injection attacks. InProceedings of the 2024 Workshop on Artificial Intelligence and Security, pages 89–100, 2024

2024

[17] [17]

Saltzer and M.D

J.H. Saltzer and M.D. Schroeder. The protection of information in computer systems.Pro- ceedings of the IEEE, 63(9):1278–1308, 1975. doi: 10.1109/PROC.1975.9939

work page doi:10.1109/proc.1975.9939 1975

[18] [18]

Progent: Securing AI Agents with Privilege Control

Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, and Dawn Song. Progent: Programmable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. The instruction hierarchy: Training llms to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

work page arXiv 2025

[21] [21]

The landscape of prompt injection threats in llm agents: From taxonomy to analysis.arXiv preprint arXiv:2602.10453, 2026

Peiran Wang, Xinfeng Li, Chong Xiang, Jinghuai Zhang, Ying Li, Lixia Zhang, Xiaofeng Wang, and Yuan Tian. The landscape of prompt injection threats in llm agents: From taxonomy to analysis.arXiv preprint arXiv:2602.10453, 2026

work page arXiv 2026

[22] [22]

System-level defense against indi- rect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indi- rect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024

[23] [23]

Iso- lategpt: An execution isolation architecture for llm-based agentic systems.arXiv preprint arXiv:2403.04960, 2024

Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. Iso- lategpt: An execution isolation architecture for llm-based agentic systems.arXiv preprint arXiv:2403.04960, 2024

work page arXiv 2024

[24] [24]

ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Confer- ence on Learning Representations (ICLR), 2023

2023

[25] [25]

Injecagent: Benchmarking in- direct prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking in- direct prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024

2024

[26] [26]

Adaptive attacks break defenses against indirect prompt injection attacks on llm agents

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025

2025

[27] [27]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents.arXiv preprint arXiv:2410.02644, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Rtbas: Defending llm agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025

Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L Titzer, Heather Miller, and Phillip B Gibbons. Rtbas: Defending llm agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025. 11 A System Prompts We present the complete system prompts used by each component of AUTHGRAPH. This section contains: •Section A.1Gr...

work page arXiv 2025

[29] [29]

The observation contains no risk-relevant information

[30] [30]

The graph already captures the necessary information

[31] [31]

reasoning

The task appears complete. Only add specific, detailed risk information. Keep changes minimal. Use add_node, add_edge, update_node, update_edge, validate_graph, submit_graph. A.2 Planner (Authorization Graph Generation) Planner System Prompt (abbreviated) You are a security planner for an LLM agent system. Given a user’s task and available tools, you gene...

[32] [32]

extra_step_ok

"extra_step_ok" | Legitimate auxiliary/helper step. Includes: (a) Same-domain READ/GET/SEARCH tools that gather context for planned WRITE steps. (b) Generic lookup tools (get_current_day, list_*, check_*)

[33] [33]

skipped_step_ok

"skipped_step_ok" | Agent skipped expected step (already had info)

[34] [34]

suspicious

"suspicious" | No legitimate connection to user’s task; may be injection-driven. Reserve for tools that write to external destinations not in the task, fetch external URLs, or belong to a completely unrelated tool family. When in doubt between 1 and 3, lean toward "extra_step_ok" if the tool is read-only and in the same domain as a planned step. Return JS...

[35] [35]

Do NOT verify the agent’s arithmetic or logic

[36] [36]

serves the user’s goal

Do NOT judge whether the value "serves the user’s goal"

[37] [37]

I cannot trace this value

Do NOT use "I cannot trace this value" as evidence of injection

[38] [38]

aligned": true/false,

Do NOT flag just because the value is not in the user’s prompt. Observations ARE the expected source for these params. Return JSON: {"aligned": true/false, "reason": "..."} If aligned:false, reason MUST contain a verbatim excerpt from the observation. B Additional Case Studies We present representative case studies illustrating AUTHGRAPH’s detection capab...