Ghost in the Context: Policy-Carriage Integrity in LLM Agents
Pith reviewed 2026-07-02 23:57 UTC · model grok-4.3
The pith
LLM agents require policies to remain fully present and bound in decision states before actions to ensure integrity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under controlled pressure replay over AutoGen/tau3 and OpenHands/SWE-bench traces, the tested protected-placement configurations preserve policy across the pressure sweep, while task-local placement exhibits eviction, weakening, or over-budget continuation depending on the context manager. A fixed-assembler behavioral calibration produced 0/90 unsafe-action proposals and 0/90 unguarded policy violations, so policy absence alone did not establish unsafe model behavior.
What carries the argument
Policy-carriage integrity: the requirement that applicable trusted policies remain present, sound, and correctly bound in the decision state immediately before action.
If this is right
- Assign typed provenance to policy state.
- Isolate control budget to protect policy space.
- Check before assembly that the complete active policy set fits.
- Fail closed on overload rather than proceeding with partial policies.
- Enforce structured policies at the action boundary.
Where Pith is reading between the lines
- The same placement distinctions could affect policy retention in agent frameworks not included in the tested traces.
- Dynamic policy updates during long-running sessions might introduce new carriage risks even under protected placement.
- Preflight checks could be combined with existing context-window compression techniques to improve fit rates.
Load-bearing premise
The pressure replay and chosen traces from AutoGen/tau3 and OpenHands/SWE-bench sufficiently model the conditions under which policy-carriage integrity could be compromised in real LLM agent deployments.
What would settle it
A demonstration that a protected-placement configuration loses policy integrity under the same pressure replay conditions used in the study.
Figures
read the original abstract
LLM agents choose actions from bounded decision states assembled from system policy, runtime state, tools, workload content, and the final request. We study policy-carriage integrity: applicable trusted policies must remain present, sound, and correctly bound in the decision state immediately before action. Under controlled pressure replay over AutoGen/tau3 and OpenHands/SWE-bench traces, the tested protected-placement configurations preserve policy across the pressure sweep, while task-local placement exhibits eviction, weakening, or over-budget continuation depending on the context manager. We keep this result state-level: a fixed-assembler behavioral calibration produced 0/90 unsafe-action proposals and 0/90 unguarded policy violations, so policy absence alone did not establish unsafe model behavior. The resulting design guidance is systems-level: assign typed provenance to policy state, isolate control budget, check before assembly that the complete active policy set fits, fail closed on overload, and enforce structured policies at the action boundary. We present ControlCapsule as a reference design pattern for these requirements; exact active-policy replay + preflight remains the key baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies policy-carriage integrity in LLM agents, where trusted policies must remain present and correctly bound in the decision state before action. Using controlled pressure replay on AutoGen/tau3 and OpenHands/SWE-bench traces, it reports that protected-placement configurations preserve policy across the sweep while task-local placement exhibits eviction, weakening, or over-budget continuation. The work presents this as a state-level result (0/90 unsafe proposals, 0/90 unguarded violations under a fixed-assembler calibration) and offers systems-level design guidance plus ControlCapsule as a reference pattern emphasizing typed provenance, isolated control budget, pre-assembly fit checks, fail-closed overload handling, and action-boundary enforcement.
Significance. If the distinction between placement strategies holds under the reported conditions, the result supplies actionable systems guidance for preventing policy loss in agent context managers, a practical contribution to LLM agent security that separates carriage integrity from downstream model behavior.
major comments (2)
- [Abstract / experimental setup] The central empirical claim (protected vs. task-local placement under pressure replay) rests on the assumption that the chosen AutoGen/tau3 and OpenHands/SWE-bench traces plus replay mechanism adequately model real deployment pressure regimes; the manuscript does not provide evidence or discussion that these traces capture longer/variable context windows, concurrent tool calls, or adversarial injections that could alter eviction patterns.
- [Abstract / results paragraph] The quantitative result (0/90 unsafe proposals and 0/90 unguarded violations) is presented without reported details on experimental controls, number of runs, statistical tests, or variance across context managers, which is load-bearing for interpreting the placement-strategy distinction.
minor comments (2)
- [Abstract] The term 'ControlCapsule' is introduced as a reference design pattern but its precise interface or pseudocode is not shown in the provided abstract-level description.
- [Abstract] Notation for 'pressure sweep' and 'context manager' behaviors could be clarified with a small table or diagram to make the distinction between placement strategies easier to follow.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. Below we respond point-by-point to the major comments, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Abstract / experimental setup] The central empirical claim (protected vs. task-local placement under pressure replay) rests on the assumption that the chosen AutoGen/tau3 and OpenHands/SWE-bench traces plus replay mechanism adequately model real deployment pressure regimes; the manuscript does not provide evidence or discussion that these traces capture longer/variable context windows, concurrent tool calls, or adversarial injections that could alter eviction patterns.
Authors: The AutoGen/tau3 and OpenHands/SWE-bench traces were selected as representative of established multi-agent frameworks and software-engineering benchmarks that include sequential task execution under growing context. The replay mechanism applies controlled pressure by incrementally expanding context within these traces. We agree that the study does not furnish direct evidence that the observed placement distinction would hold under longer or variable context windows, concurrent tool calls, or adversarial injections. In the revised version we add an explicit limitations paragraph that states the scope of the traces, notes the absence of those regimes, and identifies them as targets for follow-on validation. revision: partial
-
Referee: [Abstract / results paragraph] The quantitative result (0/90 unsafe proposals and 0/90 unguarded violations) is presented without reported details on experimental controls, number of runs, statistical tests, or variance across context managers, which is load-bearing for interpreting the placement-strategy distinction.
Authors: The 0/90 counts derive from a fixed-assembler behavioral calibration executed over 90 trials per placement strategy (protected and task-local) on each of the two frameworks. The calibration isolates state-level carriage by using a deterministic assembler that records policy presence before model invocation; this procedure is described in the experimental section of the full manuscript. Because the outcome under this calibration was deterministic (zero unsafe proposals and zero unguarded violations), no statistical hypothesis tests were applied. We will expand the results paragraph to state the trial count explicitly, restate the fixed-assembler control, and report any observed variance in policy retention across the context managers tested. revision: yes
Circularity Check
No circularity: empirical observations only, no derivations or self-referential reductions
full rationale
The paper reports direct experimental results from controlled pressure replay on AutoGen/tau3 and OpenHands/SWE-bench traces, stating that protected-placement configurations preserve policy while task-local placement exhibits eviction or weakening. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The state-level result (0/90 unsafe proposals) is presented as a calibration outcome of the fixed-assembler setup, not as a derived claim that reduces to its inputs by construction. Design guidance follows from the observations without load-bearing self-references or ansatzes smuggled via citation. This is a standard non-finding for an empirical systems paper whose central claims rest on external benchmarks rather than internal tautology.
Axiom & Free-Parameter Ledger
invented entities (1)
-
ControlCapsule
no independent evidence
Forward citations
Cited by 2 Pith papers
-
Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents
Context compaction erases in-context governance constraints in LLM agents, raising policy violation rates from 0% to 30% (up to 59% for some models) on the ConstraintRot benchmark.
-
Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents
Context compaction silently drops governance constraints in LLM agents, raising policy violation rates from 0% to 30% on average, with a proposed pinning mitigation restoring compliance.
Reference graph
Works this paper leans on
-
[1]
Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ...
-
[2]
Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2025. In-Context Learning with Long-Context Models: An In-Depth Exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Pap...
-
[3]
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems 37. Curran Associates, Inc., Vancouver, Canada, 26. https://doi.org/10.52202/079017-2636 Datase...
-
[4]
Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query- Only Interaction. InAdvances in Neural Information Processing Systems 38. Curran Associates, Inc., San Diego, CA, USA, 35. https://neurips.cc/virtual/2025/poster/ 118152 Poster
2025
-
[5]
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling. OpenReview.net, Philadelphia, PA, USA, 27
2024
-
[6]
Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 25961–25970. https://doi.org/10.18653/v1/2025.emnlp-main.1318
-
[7]
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638
-
[8]
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xi- ang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InThe Twelfth International Conference on Le...
2024
-
[9]
Zesen Liu, Zhixiang Zhang, Yuchong Xie, and Dongdong She. 2025. Com- pressionAttack: Exploiting Prompt Compression as a New Attack Sur- face in LLM-Powered Agents. https://doi.org/10.48550/arXiv.2510.22963 arXiv:cs.CR/2510.22963
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.22963 2025
-
[10]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, Bangkok, Thailand, 13851–13870...
-
[11]
Rossi, Seunghyun Yoon, and Hinrich Sch"utze
Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, and Hinrich Sch"utze. 2025. NoLiMa: Long-Context Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Evaluation Beyond Literal Matching. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Mac...
2025
-
[12]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. https://doi.org/10.48550/arXiv.2310.08560 arXiv:cs.AI/2310.08560
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560 2023
-
[13]
Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco, CA, USA, 2:1–2:22. https://doi.org/10.1145/3586183.3606763
-
[14]
Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Tech- niques for Language Models. https://doi.org/10.48550/arXiv.2211.09527 arXiv:cs.CL/2211.09527 Presented at the NeurIPS ML Safety Workshop 2022
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527 2022
-
[15]
Maddison, and Tatsunori Hashimoto
Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 68. https://proceedings.iclr.cc/paper_files/paper...
2024
-
[16]
Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. MemInsight: Autonomous Memory Augmentation for LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 33136–33152. https://doi.org/10.18653/v1/202...
-
[17]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Or- leans, USA, 13. https://proceedings.neurips.cc/paper_f...
2023
-
[18]
Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, and Omer Levy. 2023. Zero- SCROLLS: A Zero-Shot Benchmark for Long Text Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 7977–7989. https://doi.org/10.18653/v1/ 2023.findings-emnlp.536
-
[19]
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, and Omer Levy. 2022. SCROLLS: Standardized CompaRison Over Long Language Sequences. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United...
-
[20]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 19. https://proceedings.neurips.cc/paper_files/paper/ 2023/hash/1b44b878bb782e6954cd888628510e90-Abstrac...
2023
-
[21]
Saksham Sahai Srivastava and Haoyu He. 2025. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval. https: //doi.org/10.48550/arXiv.2512.16962 arXiv:cs.CR/2512.16962
-
[22]
Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong
-
[23]
InFindings of the Association for Computational Linguistics: ACL 2025
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 19336– 19352. https://doi.org/10.18653/v1/2025.findings-acl.989
-
[24]
Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evalu- ating Large Language Models on Planning and Reasoning about Change. InAd- vances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 13. https://proceedings.neurips.cc/paper_...
2023
-
[25]
Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. 2024. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. https://doi.org/10.48550/arXiv.2404.13208 arXiv:cs.CR/2404.13208
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.13208 2024
-
[26]
Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. 2025. Unveiling Privacy Risks in LLM Agent Memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 25241–25260. https://doi.org/10.18653/v1/20...
-
[27]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. https://doi.org/10.48550/arXiv.2305.16291 arXiv:cs.AI/2305.16291
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291 2023
-
[28]
Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, and David Wagner
-
[29]
Defending against prompt injection with DataFilter,
Defending Against Prompt Injection with DataFilter. https://doi.org/10. 48550/arXiv.2510.19207 arXiv:cs.CR/2510.19207
-
[30]
Zhenting Wang, Huancheng Chen, Jiayun Wang, and Wei Wei. 2026. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory. https: //doi.org/10.48550/arXiv.2603.04257 arXiv:cs.CL/2603.04257
-
[31]
Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. 2025. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. https://doi. org/10.48550/arXiv.2510.02373 arXiv:cs.CR/2510.02373
-
[32]
Ruoyao Wen, Hao Li, Chaowei Xiao, and Ning Zhang. 2026. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management. https://doi.org/10.48550/arXiv.2602.07398 arXiv:cs.CR/2602.07398
-
[33]
Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis
-
[34]
InThe Twelfth International Conference on Learning Representations
Efficient Streaming Language Models with Attention Sinks. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 21. https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 5e5fd18f863cbe6d8ae392a93fd271c9-Abstract-Conference.html
2024
-
[35]
Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. 2026. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections. https://doi.org/10.48550/arXiv.2602.15654 arXiv:cs.CR/2602.15654 Presented at Lifelong Agent @ ICLR 2026
-
[36]
Narasimhan, and Yuan Cao
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. OpenReview.net, Kigali, Rwanda, 33. https://iclr.cc/virtual/2023/poster/ 11003
2023
-
[37]
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. ACM, Toronto, ON, Canada, 1809–1820. https://doi.org/10.1145/3690624.3709179
-
[38]
Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. 2025. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. InProceedings of the Second Conference on Language Modeling. OpenReview.net, Montreal, Canada, 26
2025
-
[39]
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL
2024
-
[40]
https://doi.org/10.18653/v1/2024.findings-acl.624
Association for Computational Linguistics, Bangkok, Thailand, 10471– 10506. https://doi.org/10.18653/v1/2024.findings-acl.624
-
[41]
Guilin Zhang, Wei Jiang, Xiejiashan Wang, Aisha Behr, Kai Zhao, Jeffrey Fried- man, Xu Chu, and Amine Anoun. 2026. Adaptive Memory Admission Control for LLM Agents. https://doi.org/10.48550/arXiv.2603.04549 arXiv:cs.AI/2603.04549
-
[42]
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In The Thirteenth International Conference on Learning Representations. OpenRe- view.net, Singapore, 36. https://proceedings.iclr.cc/paper_...
2025
- [43]
-
[44]
Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. 2025. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. https://doi.org/10.48550/arXiv.2510.12635 arXiv:cs.AI/2510.12635
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.12635 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.