Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly
Pith reviewed 2026-05-20 23:25 UTC · model grok-4.3
The pith
Decision-time context assembly is a measurable part of the LLM control path that can be partially hardened.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. The study measures this over several models using exact constraint respect judgments and audits of state visibility. SafeContext, which pins control state and reuses prefixes while keeping weights fixed, yields partial mitigation that is stronger against simple truncation than against structured compaction. The same assembly failure pattern appears in the
What carries the argument
SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed.
Load-bearing premise
The judged exact constraint respect and direct audits of assembled-state visibility accurately isolate failures caused by assembly rather than other model behaviors or prompt effects.
What would settle it
A test where policy violations occur at the same rate despite perfect retention of all directive state in the assembled context would show that assembly is not the source of the failures.
Figures
read the original abstract
LLM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. We study this failure mode over local Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect and direct audits of assembled-state visibility. We evaluate SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed. Unmitigated risk is systematic, but absolute exact compliance remains low. Against truncation, SafeContext yields small gains; against a strong structured-compaction policy, most aggregate lift disappears, leaving residual benefit mainly in overflow eviction and selected aliasing slices. Replay-only does not explain the effect. A larger-model extension on Qwen 14B and Llama 70B shows the same failure object under larger models, although sign and magnitude remain policy-conditional. Decision-time context assembly is therefore a measurable part of the control path that can be partially hardened.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLM agents can violate policies due to failures in decision-time context assembly (truncation, summarization, reordering, rewriting) even without prompt overrides or model changes. It measures this risk via judged exact constraint respect and direct audits on Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B, evaluates SafeContext as a mitigation that pins control state and injects reminders, reports systematic unmitigated risk with low absolute compliance, small gains against truncation that largely vanish under strong structured-compaction, and similar patterns on larger models (Qwen 14B, Llama 70B). The conclusion is that decision-time assembly is a measurable and partially hardenable part of the control path.
Significance. If the empirical measurements correctly isolate assembly-induced carriage failures, the work identifies a previously under-examined control surface in agentic LLM systems and demonstrates that lightweight, weight-fixed interventions can yield policy-conditional improvements. The larger-model extension and comparison across compaction policies add breadth, though the absence of quantitative results, baselines, and error bars in the provided abstract constrains evaluation of effect sizes and robustness.
major comments (2)
- [§4] §4 (Evaluation Methodology): The central attribution of observed policy-carriage failures to assembly operations (truncation, summarization, etc.) rests on 'judged exact constraint respect' and 'direct audits of assembled-state visibility' without an explicit baseline that supplies the identical directive set in full, unassembled history under otherwise identical conditions. This leaves open the possibility that measured drops reflect base-model non-compliance or prompt sensitivity rather than assembly-specific effects, directly undermining the claim that assembly is a distinct measurable part of the control path.
- [Abstract, §5] Abstract and §5 (Results): Absolute exact compliance remains low even with SafeContext, and most aggregate lift disappears under strong structured-compaction. Without reported quantitative values, error bars, or baseline comparisons, it is unclear whether the residual benefits in overflow eviction and aliasing slices are statistically meaningful or sufficient to support the 'partially hardened' conclusion.
minor comments (2)
- Clarify the exact prompting and judgment protocol used for 'judged exact constraint respect' (e.g., judge model, few-shot examples, inter-annotator agreement) so readers can assess reliability.
- Provide the precise definitions and examples of the 'strong structured-compaction policy' versus truncation to allow replication of the policy-conditional results.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed report. We address each major comment below, providing clarifications on our methodology and indicating the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Evaluation Methodology): The central attribution of observed policy-carriage failures to assembly operations (truncation, summarization, etc.) rests on 'judged exact constraint respect' and 'direct audits of assembled-state visibility' without an explicit baseline that supplies the identical directive set in full, unassembled history under otherwise identical conditions. This leaves open the possibility that measured drops reflect base-model non-compliance or prompt sensitivity rather than assembly-specific effects, directly undermining the claim that assembly is a distinct measurable part of the control path.
Authors: We appreciate the referee's emphasis on isolating assembly-specific effects. Our direct audits of assembled-state visibility explicitly check for the presence, binding, and integrity of directive tokens in the final context passed to the model. Violations where the constraint remains visible are attributed to model behavior, while absences or alterations are attributed to assembly. This provides a per-instance decomposition rather than relying solely on aggregate compliance. That said, we agree that an explicit no-assembly baseline (full directive history without truncation or compaction) would offer a cleaner contrast. We will add a supplementary experiment in revised §4 using shorter histories that fit without assembly to quantify the incremental effect of assembly operations. revision: partial
-
Referee: [Abstract, §5] Abstract and §5 (Results): Absolute exact compliance remains low even with SafeContext, and most aggregate lift disappears under strong structured-compaction. Without reported quantitative values, error bars, or baseline comparisons, it is unclear whether the residual benefits in overflow eviction and aliasing slices are statistically meaningful or sufficient to support the 'partially hardened' conclusion.
Authors: We agree that the abstract and §5 would benefit from explicit numerical reporting. While §5 contains per-policy compliance figures and comparisons, we will revise the abstract to include key quantitative results (e.g., exact compliance rates for baseline vs. SafeContext under truncation and structured-compaction) together with standard errors from repeated runs. We will also add a summary table of effect sizes and baseline contrasts. These changes will make clear that residual gains are concentrated in specific slices such as overflow eviction and remain consistent though modest, supporting the 'partially hardened' characterization without overstating absolute performance. revision: yes
Circularity Check
No significant circularity in empirical measurement study
full rationale
The paper presents an empirical evaluation of policy-carriage failures during decision-time context assembly in LLMs, using judged exact constraint respect and direct audits of assembled-state visibility across models such as Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B. It compares unmitigated risk against the SafeContext mitigation layer under truncation and structured-compaction policies, with extensions to larger models. No derivation chain, equations, fitted parameters renamed as predictions, or self-referential definitions are present; the central claim that decision-time assembly is a measurable and partially harden-able part of the control path follows directly from the reported experimental observations and comparisons rather than reducing to inputs by construction. Any self-citations are not load-bearing for the core findings, and the work remains self-contained as a standard measurement study without circular reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ...
-
[2]
Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2025. In-Context Learning with Long-Context Models: An In-Depth Exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Pap...
-
[3]
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems 37. Curran Associates, Inc., Vancouver, Canada, 26. https://doi.org/10.52202/079017-2636 Datase...
-
[4]
Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query- Only Interaction. InAdvances in Neural Information Processing Systems 38. Curran Associates, Inc., San Diego, CA, USA, 35. https://neurips.cc/virtual/2025/poster/ 118152 Poster
work page 2025
-
[5]
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling. OpenReview.net, Philadelphia, PA, USA, 27
work page 2024
-
[6]
Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 25961–25970. https://doi.org/10.18653/v1/2025.emnlp-main.1318
-
[7]
Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638
-
[8]
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xi- ang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InThe Twelfth International Conference on Le...
work page 2024
-
[9]
Zesen Liu, Zhixiang Zhang, Yuchong Xie, and Dongdong She. 2025. Com- pressionAttack: Exploiting Prompt Compression as a New Attack Sur- face in LLM-Powered Agents. https://doi.org/10.48550/arXiv.2510.22963 arXiv:cs.CR/2510.22963
-
[10]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, Bangkok, Thailand, 13851–13870...
-
[11]
Rossi, Seunghyun Yoon, and Hinrich Sch"utze
Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, and Hinrich Sch"utze. 2025. NoLiMa: Long-Context Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Evaluation Beyond Literal Matching. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Mac...
work page 2025
-
[12]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. https://doi.org/10.48550/arXiv.2310.08560 arXiv:cs.AI/2310.08560
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560 2023
-
[13]
O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S
Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco, CA, USA, 2:1–2:22. https://doi.org/10.1145/3586183.3606763
-
[14]
Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Tech- niques for Language Models. https://doi.org/10.48550/arXiv.2211.09527 arXiv:cs.CL/2211.09527 Presented at the NeurIPS ML Safety Workshop 2022
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527 2022
-
[15]
Maddison, and Tatsunori Hashimoto
Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 68. https://proceedings.iclr.cc/paper_files/paper...
work page 2024
-
[16]
Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. MemInsight: Autonomous Memory Augmentation for LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 33136–33152. https://doi.org/10.18653/v1/202...
-
[17]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Or- leans, USA, 13. https://proceedings.neurips.cc/paper_f...
work page 2023
-
[18]
Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, and Omer Levy. 2023. Zero- SCROLLS: A Zero-Shot Benchmark for Long Text Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 7977–7989. https://doi.org/10.18653/v1/ 2023.findings-emnlp.536
-
[19]
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, and Omer Levy. 2022. SCROLLS: Standardized CompaRison Over Long Language Sequences. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United...
-
[20]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 19. https://proceedings.neurips.cc/paper_files/paper/ 2023/hash/1b44b878bb782e6954cd888628510e90-Abstrac...
work page 2023
-
[21]
Saksham Sahai Srivastava and Haoyu He. 2025. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval. https: //doi.org/10.48550/arXiv.2512.16962 arXiv:cs.CR/2512.16962
-
[22]
Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong
-
[23]
InFindings of the Association for Computational Linguistics: ACL 2025
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 19336– 19352. https://doi.org/10.18653/v1/2025.findings-acl.989
-
[24]
Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evalu- ating Large Language Models on Planning and Reasoning about Change. InAd- vances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 13. https://proceedings.neurips.cc/paper_...
work page 2023
-
[25]
Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. 2024. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. https://doi.org/10.48550/arXiv.2404.13208 arXiv:cs.CR/2404.13208
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.13208 2024
-
[26]
Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. 2025. Unveiling Privacy Risks in LLM Agent Memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 25241–25260. https://doi.org/10.18653/v1/20...
-
[27]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. https://doi.org/10.48550/arXiv.2305.16291 arXiv:cs.AI/2305.16291
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291 2023
-
[28]
Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, and David Wagner
-
[29]
Defending Against Prompt Injection with DataFilter. https://doi.org/10. 48550/arXiv.2510.19207 arXiv:cs.CR/2510.19207
-
[30]
Zhenting Wang, Huancheng Chen, Jiayun Wang, and Wei Wei. 2026. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory. https: //doi.org/10.48550/arXiv.2603.04257 arXiv:cs.CL/2603.04257
-
[31]
Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. 2025. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. https://doi. org/10.48550/arXiv.2510.02373 arXiv:cs.CR/2510.02373
-
[32]
Ruoyao Wen, Hao Li, Chaowei Xiao, and Ning Zhang. 2026. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management. https://doi.org/10.48550/arXiv.2602.07398 arXiv:cs.CR/2602.07398
-
[33]
Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis
-
[34]
InThe Twelfth International Conference on Learning Representations
Efficient Streaming Language Models with Attention Sinks. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 21. https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 5e5fd18f863cbe6d8ae392a93fd271c9-Abstract-Conference.html
work page 2024
-
[35]
Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. 2026. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections. https://doi.org/10.48550/arXiv.2602.15654 arXiv:cs.CR/2602.15654 Presented at Lifelong Agent @ ICLR 2026
-
[36]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. OpenReview.net, Kigali, Rwanda, 33. https://iclr.cc/virtual/2023/poster/ 11003
work page 2023
-
[37]
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. ACM, Toronto, ON, Canada, 1809–1820. https://doi.org/10.1145/3690624.3709179
-
[38]
Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. 2025. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. InProceedings of the Second Conference on Language Modeling. OpenReview.net, Montreal, Canada, 26
work page 2025
-
[39]
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL
work page 2024
-
[40]
Association for Computational Linguistics, Bangkok, Thailand, 10471– 10506. https://doi.org/10.18653/v1/2024.findings-acl.624
-
[41]
Guilin Zhang, Wei Jiang, Xiejiashan Wang, Aisha Behr, Kai Zhao, Jeffrey Fried- man, Xu Chu, and Amine Anoun. 2026. Adaptive Memory Admission Control for LLM Agents. https://doi.org/10.48550/arXiv.2603.04549 arXiv:cs.AI/2603.04549
-
[42]
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In The Thirteenth International Conference on Learning Representations. OpenRe- view.net, Singapore, 36. https://proceedings.iclr.cc/paper_...
work page 2025
- [43]
-
[44]
Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. 2025. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. https://doi.org/10.48550/arXiv.2510.12635 arXiv:cs.AI/2510.12635
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.12635 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.