pith. sign in

arxiv: 2605.12535 · v3 · pith:XCIRARNGnew · submitted 2026-05-02 · 💻 cs.CR

Ghost in the Context: Policy-Carriage Integrity in LLM Agents

Pith reviewed 2026-07-02 23:57 UTC · model grok-4.3

classification 💻 cs.CR
keywords policy-carriage integrityLLM agentscontext managementpolicy placementAutoGenOpenHandsControlCapsule
0
0 comments X

The pith

LLM agents require policies to remain fully present and bound in decision states before actions to ensure integrity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines policy-carriage integrity: the need for trusted policies to stay present, sound, and correctly bound in an LLM agent's decision state right before it acts. Experiments replay pressure on traces from AutoGen/tau3 and OpenHands/SWE-bench. Protected-placement configurations keep policies intact across the sweep. Task-local placement instead shows eviction, weakening, or budget overruns. The work isolates the problem to context assembly and supplies systems-level fixes including preflight fit checks and fail-closed behavior on overload.

Core claim

Under controlled pressure replay over AutoGen/tau3 and OpenHands/SWE-bench traces, the tested protected-placement configurations preserve policy across the pressure sweep, while task-local placement exhibits eviction, weakening, or over-budget continuation depending on the context manager. A fixed-assembler behavioral calibration produced 0/90 unsafe-action proposals and 0/90 unguarded policy violations, so policy absence alone did not establish unsafe model behavior.

What carries the argument

Policy-carriage integrity: the requirement that applicable trusted policies remain present, sound, and correctly bound in the decision state immediately before action.

If this is right

  • Assign typed provenance to policy state.
  • Isolate control budget to protect policy space.
  • Check before assembly that the complete active policy set fits.
  • Fail closed on overload rather than proceeding with partial policies.
  • Enforce structured policies at the action boundary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same placement distinctions could affect policy retention in agent frameworks not included in the tested traces.
  • Dynamic policy updates during long-running sessions might introduce new carriage risks even under protected placement.
  • Preflight checks could be combined with existing context-window compression techniques to improve fit rates.

Load-bearing premise

The pressure replay and chosen traces from AutoGen/tau3 and OpenHands/SWE-bench sufficiently model the conditions under which policy-carriage integrity could be compromised in real LLM agent deployments.

What would settle it

A demonstration that a protected-placement configuration loses policy integrity under the same pressure replay conditions used in the study.

Figures

Figures reproduced from arXiv: 2605.12535 by Igor Santos-Grueiro.

Figure 1
Figure 1. Figure 1: shows the subsystem under study: directive-bearing and non-control state compete under pressure before action time. 2.2 The Adversarial Scheduler The adversary controls scheduling pressure over visible interaction history: how much non-control content appears, where it appears, and how directive-bearing state is separated or framed. That pres￾sure changes what survives truncation, how summaries rewrite ear… view at source ↗
Figure 2
Figure 2. Figure 2: Failure traces from input history to decision state [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Control-layer intervention for decision-time context assembly: directive-bearing state is explicitly mediated before [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: C1 unmitigated risk profile by model and attack [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: C2: Mean ΔECR (blue) and ΔDFR (orange) versus matched truncation for each mitigation variant. The panels show eviction, aliasing, and binding instability. Labels are explicit (e.g., SCP+ICE (A), SCP+Cache (O)), where A/O denote autonomous/oracle routing [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prefix of the reconstructed decision state in the frozen agentic case. Left: unmitigated truncation begins directly inside [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: C3 heterogeneity profile by mitigation variant and [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: CSR by input-token pressure bin for the unmitigated path (dashed) and SCP+ICE (A) (solid), shown per model with [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: C4: Cost-benefit profile by mitigation family. Left: [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

LLM agents choose actions from bounded decision states assembled from system policy, runtime state, tools, workload content, and the final request. We study policy-carriage integrity: applicable trusted policies must remain present, sound, and correctly bound in the decision state immediately before action. Under controlled pressure replay over AutoGen/tau3 and OpenHands/SWE-bench traces, the tested protected-placement configurations preserve policy across the pressure sweep, while task-local placement exhibits eviction, weakening, or over-budget continuation depending on the context manager. We keep this result state-level: a fixed-assembler behavioral calibration produced 0/90 unsafe-action proposals and 0/90 unguarded policy violations, so policy absence alone did not establish unsafe model behavior. The resulting design guidance is systems-level: assign typed provenance to policy state, isolate control budget, check before assembly that the complete active policy set fits, fail closed on overload, and enforce structured policies at the action boundary. We present ControlCapsule as a reference design pattern for these requirements; exact active-policy replay + preflight remains the key baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies policy-carriage integrity in LLM agents, where trusted policies must remain present and correctly bound in the decision state before action. Using controlled pressure replay on AutoGen/tau3 and OpenHands/SWE-bench traces, it reports that protected-placement configurations preserve policy across the sweep while task-local placement exhibits eviction, weakening, or over-budget continuation. The work presents this as a state-level result (0/90 unsafe proposals, 0/90 unguarded violations under a fixed-assembler calibration) and offers systems-level design guidance plus ControlCapsule as a reference pattern emphasizing typed provenance, isolated control budget, pre-assembly fit checks, fail-closed overload handling, and action-boundary enforcement.

Significance. If the distinction between placement strategies holds under the reported conditions, the result supplies actionable systems guidance for preventing policy loss in agent context managers, a practical contribution to LLM agent security that separates carriage integrity from downstream model behavior.

major comments (2)
  1. [Abstract / experimental setup] The central empirical claim (protected vs. task-local placement under pressure replay) rests on the assumption that the chosen AutoGen/tau3 and OpenHands/SWE-bench traces plus replay mechanism adequately model real deployment pressure regimes; the manuscript does not provide evidence or discussion that these traces capture longer/variable context windows, concurrent tool calls, or adversarial injections that could alter eviction patterns.
  2. [Abstract / results paragraph] The quantitative result (0/90 unsafe proposals and 0/90 unguarded violations) is presented without reported details on experimental controls, number of runs, statistical tests, or variance across context managers, which is load-bearing for interpreting the placement-strategy distinction.
minor comments (2)
  1. [Abstract] The term 'ControlCapsule' is introduced as a reference design pattern but its precise interface or pseudocode is not shown in the provided abstract-level description.
  2. [Abstract] Notation for 'pressure sweep' and 'context manager' behaviors could be clarified with a small table or diagram to make the distinction between placement strategies easier to follow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we respond point-by-point to the major comments, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract / experimental setup] The central empirical claim (protected vs. task-local placement under pressure replay) rests on the assumption that the chosen AutoGen/tau3 and OpenHands/SWE-bench traces plus replay mechanism adequately model real deployment pressure regimes; the manuscript does not provide evidence or discussion that these traces capture longer/variable context windows, concurrent tool calls, or adversarial injections that could alter eviction patterns.

    Authors: The AutoGen/tau3 and OpenHands/SWE-bench traces were selected as representative of established multi-agent frameworks and software-engineering benchmarks that include sequential task execution under growing context. The replay mechanism applies controlled pressure by incrementally expanding context within these traces. We agree that the study does not furnish direct evidence that the observed placement distinction would hold under longer or variable context windows, concurrent tool calls, or adversarial injections. In the revised version we add an explicit limitations paragraph that states the scope of the traces, notes the absence of those regimes, and identifies them as targets for follow-on validation. revision: partial

  2. Referee: [Abstract / results paragraph] The quantitative result (0/90 unsafe proposals and 0/90 unguarded violations) is presented without reported details on experimental controls, number of runs, statistical tests, or variance across context managers, which is load-bearing for interpreting the placement-strategy distinction.

    Authors: The 0/90 counts derive from a fixed-assembler behavioral calibration executed over 90 trials per placement strategy (protected and task-local) on each of the two frameworks. The calibration isolates state-level carriage by using a deterministic assembler that records policy presence before model invocation; this procedure is described in the experimental section of the full manuscript. Because the outcome under this calibration was deterministic (zero unsafe proposals and zero unguarded violations), no statistical hypothesis tests were applied. We will expand the results paragraph to state the trial count explicitly, restate the fixed-assembler control, and report any observed variance in policy retention across the context managers tested. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observations only, no derivations or self-referential reductions

full rationale

The paper reports direct experimental results from controlled pressure replay on AutoGen/tau3 and OpenHands/SWE-bench traces, stating that protected-placement configurations preserve policy while task-local placement exhibits eviction or weakening. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The state-level result (0/90 unsafe proposals) is presented as a calibration outcome of the fixed-assembler setup, not as a derived claim that reduces to its inputs by construction. Design guidance follows from the observations without load-bearing self-references or ansatzes smuggled via citation. This is a standard non-finding for an empirical systems paper whose central claims rest on external benchmarks rather than internal tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only; no free parameters, axioms, or other invented entities detailed beyond the design pattern.

invented entities (1)
  • ControlCapsule no independent evidence
    purpose: A reference design pattern to meet the policy-carriage requirements
    Introduced in the abstract as a proposed pattern based on the identified requirements.

pith-pipeline@v0.9.1-grok · 5711 in / 1023 out tokens · 35020 ms · 2026-07-02T23:57:21.493342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents

    cs.AI 2026-06 unverdicted novelty 8.0

    Context compaction erases in-context governance constraints in LLM agents, raising policy violation rates from 0% to 30% (up to 59% for some models) on the ConstraintRot benchmark.

  2. Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents

    cs.AI 2026-06 unverdicted novelty 6.0

    Context compaction silently drops governance constraints in LLM agents, raising policy violation rates from 0% to 30% on average, with a proposed pinning mitigation restoring compliance.

Reference graph

Works this paper leans on

44 extracted references · 28 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ...

  2. [2]

    Gormley, and Graham Neubig

    Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2025. In-Context Learning with Long-Context Models: An In-Depth Exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Pap...

  3. [3]

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems 37. Curran Associates, Inc., Vancouver, Canada, 26. https://doi.org/10.52202/079017-2636 Datase...

  4. [4]

    Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query- Only Interaction. InAdvances in Neural Information Processing Systems 38. Curran Associates, Inc., San Diego, CA, USA, 35. https://neurips.cc/virtual/2025/poster/ 118152 Poster

  5. [5]

    Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling. OpenReview.net, Philadelphia, PA, USA, 27

  6. [6]

    Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 25961–25970. https://doi.org/10.18653/v1/2025.emnlp-main.1318

  7. [7]

    and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638

  8. [8]

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xi- ang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InThe Twelfth International Conference on Le...

  9. [9]

    Zesen Liu, Zhixiang Zhang, Yuchong Xie, and Dongdong She. 2025. Com- pressionAttack: Exploiting Prompt Compression as a New Attack Sur- face in LLM-Powered Agents. https://doi.org/10.48550/arXiv.2510.22963 arXiv:cs.CR/2510.22963

  10. [10]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, Bangkok, Thailand, 13851–13870...

  11. [11]

    Rossi, Seunghyun Yoon, and Hinrich Sch"utze

    Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, and Hinrich Sch"utze. 2025. NoLiMa: Long-Context Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Evaluation Beyond Literal Matching. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Mac...

  12. [12]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. https://doi.org/10.48550/arXiv.2310.08560 arXiv:cs.AI/2310.08560

  13. [13]

    Dipkumar Patel

    Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco, CA, USA, 2:1–2:22. https://doi.org/10.1145/3586183.3606763

  14. [14]

    Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Tech- niques for Language Models. https://doi.org/10.48550/arXiv.2211.09527 arXiv:cs.CL/2211.09527 Presented at the NeurIPS ML Safety Workshop 2022

  15. [15]

    Maddison, and Tatsunori Hashimoto

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 68. https://proceedings.iclr.cc/paper_files/paper...

  16. [16]

    Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. MemInsight: Autonomous Memory Augmentation for LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 33136–33152. https://doi.org/10.18653/v1/202...

  17. [17]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Or- leans, USA, 13. https://proceedings.neurips.cc/paper_f...

  18. [18]

    Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, and Omer Levy. 2023. Zero- SCROLLS: A Zero-Shot Benchmark for Long Text Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 7977–7989. https://doi.org/10.18653/v1/ 2023.findings-emnlp.536

  19. [19]

    Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, and Omer Levy. 2022. SCROLLS: Standardized CompaRison Over Long Language Sequences. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United...

  20. [20]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 19. https://proceedings.neurips.cc/paper_files/paper/ 2023/hash/1b44b878bb782e6954cd888628510e90-Abstrac...

  21. [21]

    Saksham Sahai Srivastava and Haoyu He. 2025. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval. https: //doi.org/10.48550/arXiv.2512.16962 arXiv:cs.CR/2512.16962

  22. [22]

    Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong

  23. [23]

    InFindings of the Association for Computational Linguistics: ACL 2025

    MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 19336– 19352. https://doi.org/10.18653/v1/2025.findings-acl.989

  24. [24]

    Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evalu- ating Large Language Models on Planning and Reasoning about Change. InAd- vances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 13. https://proceedings.neurips.cc/paper_...

  25. [25]

    Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. 2024. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. https://doi.org/10.48550/arXiv.2404.13208 arXiv:cs.CR/2404.13208

  26. [26]

    Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. 2025. Unveiling Privacy Risks in LLM Agent Memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 25241–25260. https://doi.org/10.18653/v1/20...

  27. [27]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. https://doi.org/10.48550/arXiv.2305.16291 arXiv:cs.AI/2305.16291

  28. [28]

    Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, and David Wagner

  29. [29]

    Defending against prompt injection with DataFilter,

    Defending Against Prompt Injection with DataFilter. https://doi.org/10. 48550/arXiv.2510.19207 arXiv:cs.CR/2510.19207

  30. [30]

    Zhenting Wang, Huancheng Chen, Jiayun Wang, and Wei Wei. 2026. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory. https: //doi.org/10.48550/arXiv.2603.04257 arXiv:cs.CL/2603.04257

  31. [31]

    Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. 2025. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. https://doi. org/10.48550/arXiv.2510.02373 arXiv:cs.CR/2510.02373

  32. [32]

    Ruoyao Wen, Hao Li, Chaowei Xiao, and Ning Zhang. 2026. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management. https://doi.org/10.48550/arXiv.2602.07398 arXiv:cs.CR/2602.07398

  33. [33]

    Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis

  34. [34]

    InThe Twelfth International Conference on Learning Representations

    Efficient Streaming Language Models with Attention Sinks. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 21. https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 5e5fd18f863cbe6d8ae392a93fd271c9-Abstract-Conference.html

  35. [35]

    Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. 2026. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections. https://doi.org/10.48550/arXiv.2602.15654 arXiv:cs.CR/2602.15654 Presented at Lifelong Agent @ ICLR 2026

  36. [36]

    Narasimhan, and Yuan Cao

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. OpenReview.net, Kigali, Rwanda, 33. https://iclr.cc/virtual/2023/poster/ 11003

  37. [37]

    Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. ACM, Toronto, ON, Canada, 1809–1820. https://doi.org/10.1145/3690624.3709179

  38. [38]

    Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. 2025. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. InProceedings of the Second Conference on Language Modeling. OpenReview.net, Montreal, Canada, 26

  39. [39]

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL

  40. [40]

    https://doi.org/10.18653/v1/2024.findings-acl.624

    Association for Computational Linguistics, Bangkok, Thailand, 10471– 10506. https://doi.org/10.18653/v1/2024.findings-acl.624

  41. [41]

    Guilin Zhang, Wei Jiang, Xiejiashan Wang, Aisha Behr, Kai Zhao, Jeffrey Fried- man, Xu Chu, and Amine Anoun. 2026. Adaptive Memory Admission Control for LLM Agents. https://doi.org/10.48550/arXiv.2603.04549 arXiv:cs.AI/2603.04549

  42. [42]

    Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In The Thirteenth International Conference on Learning Representations. OpenRe- view.net, Singapore, 36. https://proceedings.iclr.cc/paper_...

  43. [43]

    Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, and Ninghui Li. 2025. LLM Agents Should Employ Security Principles. https://doi.org/10. 48550/arXiv.2505.24019 arXiv:cs.CR/2505.24019

  44. [44]

    Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. 2025. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. https://doi.org/10.48550/arXiv.2510.12635 arXiv:cs.AI/2510.12635