Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

Igor Santos-Grueiro

arxiv: 2605.12535 · v2 · pith:XCIRARNGnew · submitted 2026-05-02 · 💻 cs.CR

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

Igor Santos-Grueiro This is my paper

Pith reviewed 2026-05-20 23:25 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentscontext assemblypolicy-carriage failuresSafeContextdecision-time controltruncationsummarizationagent security

0 comments

The pith

Decision-time context assembly is a measurable part of the LLM control path that can be partially hardened.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM agents operate on an assembled decision state rather than raw history, using steps like truncation and summarization to bound the input. The paper demonstrates that losing or weakening policy directives during this assembly can cause policy violations without any external tampering. By testing on multiple open-weight models and proposing SafeContext to pin and reuse control information, it shows these failures are consistent but can be reduced to some extent. The gains are modest and depend on the type of compaction applied to the context. This positions context assembly as a controllable element in maintaining agent policy adherence.

Core claim

What carries the argument

SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed.

Load-bearing premise

The judged exact constraint respect and direct audits of assembled-state visibility accurately isolate failures caused by assembly rather than other model behaviors or prompt effects.

What would settle it

A test where policy violations occur at the same rate despite perfect retention of all directive state in the assembled context would show that assembly is not the source of the failures.

Figures

Figures reproduced from arXiv: 2605.12535 by Igor Santos-Grueiro.

**Figure 1.** Figure 1: shows the subsystem under study: directive-bearing and non-control state compete under pressure before action time. 2.2 The Adversarial Scheduler The adversary controls scheduling pressure over visible interaction history: how much non-control content appears, where it appears, and how directive-bearing state is separated or framed. That pressure changes what survives truncation, how summaries rewrite ear… view at source ↗

**Figure 2.** Figure 2: Failure traces from input history to decision state [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Control-layer intervention for decision-time context assembly: directive-bearing state is explicitly mediated before [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: C1 unmitigated risk profile by model and attack [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: C2: Mean ΔECR (blue) and ΔDFR (orange) versus matched truncation for each mitigation variant. The panels show eviction, aliasing, and binding instability. Labels are explicit (e.g., SCP+ICE (A), SCP+Cache (O)), where A/O denote autonomous/oracle routing [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Prefix of the reconstructed decision state in the frozen agentic case. Left: unmitigated truncation begins directly inside [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: C3 heterogeneity profile by mitigation variant and [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: CSR by input-token pressure bin for the unmitigated path (dashed) and SCP+ICE (A) (solid), shown per model with [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: C4: Cost-benefit profile by mitigation family. Left: [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

LLM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. We study this failure mode over local Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect and direct audits of assembled-state visibility. We evaluate SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed. Unmitigated risk is systematic, but absolute exact compliance remains low. Against truncation, SafeContext yields small gains; against a strong structured-compaction policy, most aggregate lift disappears, leaving residual benefit mainly in overflow eviction and selected aliasing slices. Replay-only does not explain the effect. A larger-model extension on Qwen 14B and Llama 70B shows the same failure object under larger models, although sign and magnitude remain policy-conditional. Decision-time context assembly is therefore a measurable part of the control path that can be partially hardened.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags context assembly as a distinct policy failure point in LLM agents but the measurements don't isolate it from general model non-compliance.

read the letter

The main takeaway is that LLM agents act on a bounded, assembled decision state rather than raw history, so truncation, summarization, or rewriting can drop or weaken policy directives and let the agent cross boundaries without any prompt change or model edit. The authors audit this on Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect plus direct state visibility checks, then test SafeContext, a layer that pins control prefixes and adds reminders under pressure while leaving weights fixed. They also check larger models and note that gains are policy-dependent and often small.

Referee Report

2 major / 2 minor

Summary. The paper claims that LLM agents can violate policies due to failures in decision-time context assembly (truncation, summarization, reordering, rewriting) even without prompt overrides or model changes. It measures this risk via judged exact constraint respect and direct audits on Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B, evaluates SafeContext as a mitigation that pins control state and injects reminders, reports systematic unmitigated risk with low absolute compliance, small gains against truncation that largely vanish under strong structured-compaction, and similar patterns on larger models (Qwen 14B, Llama 70B). The conclusion is that decision-time assembly is a measurable and partially hardenable part of the control path.

Significance. If the empirical measurements correctly isolate assembly-induced carriage failures, the work identifies a previously under-examined control surface in agentic LLM systems and demonstrates that lightweight, weight-fixed interventions can yield policy-conditional improvements. The larger-model extension and comparison across compaction policies add breadth, though the absence of quantitative results, baselines, and error bars in the provided abstract constrains evaluation of effect sizes and robustness.

major comments (2)

[§4] §4 (Evaluation Methodology): The central attribution of observed policy-carriage failures to assembly operations (truncation, summarization, etc.) rests on 'judged exact constraint respect' and 'direct audits of assembled-state visibility' without an explicit baseline that supplies the identical directive set in full, unassembled history under otherwise identical conditions. This leaves open the possibility that measured drops reflect base-model non-compliance or prompt sensitivity rather than assembly-specific effects, directly undermining the claim that assembly is a distinct measurable part of the control path.
[Abstract, §5] Abstract and §5 (Results): Absolute exact compliance remains low even with SafeContext, and most aggregate lift disappears under strong structured-compaction. Without reported quantitative values, error bars, or baseline comparisons, it is unclear whether the residual benefits in overflow eviction and aliasing slices are statistically meaningful or sufficient to support the 'partially hardened' conclusion.

minor comments (2)

Clarify the exact prompting and judgment protocol used for 'judged exact constraint respect' (e.g., judge model, few-shot examples, inter-annotator agreement) so readers can assess reliability.
Provide the precise definitions and examples of the 'strong structured-compaction policy' versus truncation to allow replication of the policy-conditional results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address each major comment below, providing clarifications on our methodology and indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Evaluation Methodology): The central attribution of observed policy-carriage failures to assembly operations (truncation, summarization, etc.) rests on 'judged exact constraint respect' and 'direct audits of assembled-state visibility' without an explicit baseline that supplies the identical directive set in full, unassembled history under otherwise identical conditions. This leaves open the possibility that measured drops reflect base-model non-compliance or prompt sensitivity rather than assembly-specific effects, directly undermining the claim that assembly is a distinct measurable part of the control path.

Authors: We appreciate the referee's emphasis on isolating assembly-specific effects. Our direct audits of assembled-state visibility explicitly check for the presence, binding, and integrity of directive tokens in the final context passed to the model. Violations where the constraint remains visible are attributed to model behavior, while absences or alterations are attributed to assembly. This provides a per-instance decomposition rather than relying solely on aggregate compliance. That said, we agree that an explicit no-assembly baseline (full directive history without truncation or compaction) would offer a cleaner contrast. We will add a supplementary experiment in revised §4 using shorter histories that fit without assembly to quantify the incremental effect of assembly operations. revision: partial
Referee: [Abstract, §5] Abstract and §5 (Results): Absolute exact compliance remains low even with SafeContext, and most aggregate lift disappears under strong structured-compaction. Without reported quantitative values, error bars, or baseline comparisons, it is unclear whether the residual benefits in overflow eviction and aliasing slices are statistically meaningful or sufficient to support the 'partially hardened' conclusion.

Authors: We agree that the abstract and §5 would benefit from explicit numerical reporting. While §5 contains per-policy compliance figures and comparisons, we will revise the abstract to include key quantitative results (e.g., exact compliance rates for baseline vs. SafeContext under truncation and structured-compaction) together with standard errors from repeated runs. We will also add a summary table of effect sizes and baseline contrasts. These changes will make clear that residual gains are concentrated in specific slices such as overflow eviction and remain consistent though modest, supporting the 'partially hardened' characterization without overstating absolute performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical measurement study

full rationale

The paper presents an empirical evaluation of policy-carriage failures during decision-time context assembly in LLMs, using judged exact constraint respect and direct audits of assembled-state visibility across models such as Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B. It compares unmitigated risk against the SafeContext mitigation layer under truncation and structured-compaction policies, with extensions to larger models. No derivation chain, equations, fitted parameters renamed as predictions, or self-referential definitions are present; the central claim that decision-time assembly is a measurable and partially harden-able part of the control path follows directly from the reported experimental observations and comparisons rather than reducing to inputs by construction. Any self-citations are not load-bearing for the core findings, and the work remains self-contained as a standard measurement study without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the contribution is framed as measurement and mitigation testing.

pith-pipeline@v0.9.0 · 5753 in / 1035 out tokens · 37310 ms · 2026-05-20T23:25:32.368848+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

[1]

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ...

work page doi:10.18653/v1/2024.acl- 2024
[2]

Gormley, and Graham Neubig

Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2025. In-Context Learning with Long-Context Models: An In-Depth Exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Pap...

work page doi:10.18653/v1/2025.naacl-long.605 2025
[3]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems 37. Curran Associates, Inc., Vancouver, Canada, 26. https://doi.org/10.52202/079017-2636 Datase...

work page doi:10.52202/079017-2636 2024
[4]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query- Only Interaction. InAdvances in Neural Information Processing Systems 38. Curran Associates, Inc., San Diego, CA, USA, 35. https://neurips.cc/virtual/2025/poster/ 118152 Poster

work page 2025
[5]

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling. OpenReview.net, Philadelphia, PA, USA, 27

work page 2024
[6]

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 25961–25970. https://doi.org/10.18653/v1/2025.emnlp-main.1318

work page doi:10.18653/v1/2025.emnlp-main.1318 2025
[7]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[8]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xi- ang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InThe Twelfth International Conference on Le...

work page 2024
[9]

Zesen Liu, Zhixiang Zhang, Yuchong Xie, and Dongdong She. 2025. Com- pressionAttack: Exploiting Prompt Compression as a New Attack Sur- face in LLM-Powered Agents. https://doi.org/10.48550/arXiv.2510.22963 arXiv:cs.CR/2510.22963

work page doi:10.48550/arxiv.2510.22963 2025
[10]

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, Bangkok, Thailand, 13851–13870...

work page doi:10.18653/v1/2024.acl-long.747 2024
[11]

Rossi, Seunghyun Yoon, and Hinrich Sch"utze

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, and Hinrich Sch"utze. 2025. NoLiMa: Long-Context Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Evaluation Beyond Literal Matching. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Mac...

work page 2025
[12]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. https://doi.org/10.48550/arXiv.2310.08560 arXiv:cs.AI/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560 2023
[13]

O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco, CA, USA, 2:1–2:22. https://doi.org/10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023
[14]

Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Tech- niques for Language Models. https://doi.org/10.48550/arXiv.2211.09527 arXiv:cs.CL/2211.09527 Presented at the NeurIPS ML Safety Workshop 2022

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527 2022
[15]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 68. https://proceedings.iclr.cc/paper_files/paper...

work page 2024
[16]

Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. MemInsight: Autonomous Memory Augmentation for LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 33136–33152. https://doi.org/10.18653/v1/202...

work page doi:10.18653/v1/2025.emnlp-main.1683 2025
[17]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Or- leans, USA, 13. https://proceedings.neurips.cc/paper_f...

work page 2023
[18]

Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, and Omer Levy. 2023. Zero- SCROLLS: A Zero-Shot Benchmark for Long Text Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 7977–7989. https://doi.org/10.18653/v1/ 2023.findings-emnlp.536

work page doi:10.18653/v1/ 2023
[19]

Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, and Omer Levy. 2022. SCROLLS: Standardized CompaRison Over Long Language Sequences. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United...

work page doi:10.18653/v1/2022.emnlp-main.823 2022
[20]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 19. https://proceedings.neurips.cc/paper_files/paper/ 2023/hash/1b44b878bb782e6954cd888628510e90-Abstrac...

work page 2023
[21]

Saksham Sahai Srivastava and Haoyu He. 2025. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval. https: //doi.org/10.48550/arXiv.2512.16962 arXiv:cs.CR/2512.16962

work page doi:10.48550/arxiv.2512.16962 2025
[22]

Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong

work page
[23]

InFindings of the Association for Computational Linguistics: ACL 2025

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 19336– 19352. https://doi.org/10.18653/v1/2025.findings-acl.989

work page doi:10.18653/v1/2025.findings-acl.989 2025
[24]

Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evalu- ating Large Language Models on Planning and Reasoning about Change. InAd- vances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 13. https://proceedings.neurips.cc/paper_...

work page 2023
[25]

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. 2024. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. https://doi.org/10.48550/arXiv.2404.13208 arXiv:cs.CR/2404.13208

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.13208 2024
[26]

Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. 2025. Unveiling Privacy Risks in LLM Agent Memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 25241–25260. https://doi.org/10.18653/v1/20...

work page doi:10.18653/v1/2025.acl-long.1227 2025
[27]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. https://doi.org/10.48550/arXiv.2305.16291 arXiv:cs.AI/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291 2023
[28]

Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, and David Wagner

work page
[29]

https://doi.org/10

Defending Against Prompt Injection with DataFilter. https://doi.org/10. 48550/arXiv.2510.19207 arXiv:cs.CR/2510.19207

work page arXiv
[30]

Zhenting Wang, Huancheng Chen, Jiayun Wang, and Wei Wei. 2026. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory. https: //doi.org/10.48550/arXiv.2603.04257 arXiv:cs.CL/2603.04257

work page doi:10.48550/arxiv.2603.04257 2026
[31]

Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. 2025. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. https://doi. org/10.48550/arXiv.2510.02373 arXiv:cs.CR/2510.02373

work page doi:10.48550/arxiv.2510.02373 2025
[32]

Ruoyao Wen, Hao Li, Chaowei Xiao, and Ning Zhang. 2026. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management. https://doi.org/10.48550/arXiv.2602.07398 arXiv:cs.CR/2602.07398

work page doi:10.48550/arxiv.2602.07398 2026
[33]

Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis

work page
[34]

InThe Twelfth International Conference on Learning Representations

Efficient Streaming Language Models with Attention Sinks. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 21. https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 5e5fd18f863cbe6d8ae392a93fd271c9-Abstract-Conference.html

work page 2024
[35]

Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. 2026. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections. https://doi.org/10.48550/arXiv.2602.15654 arXiv:cs.CR/2602.15654 Presented at Lifelong Agent @ ICLR 2026

work page doi:10.48550/arxiv.2602.15654 2026
[36]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. OpenReview.net, Kigali, Rwanda, 33. https://iclr.cc/virtual/2023/poster/ 11003

work page 2023
[37]

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. ACM, Toronto, ON, Canada, 1809–1820. https://doi.org/10.1145/3690624.3709179

work page doi:10.1145/3690624.3709179 2025
[38]

Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. 2025. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. InProceedings of the Second Conference on Language Modeling. OpenReview.net, Montreal, Canada, 26

work page 2025
[39]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL

work page 2024
[40]

2024 , address =

Association for Computational Linguistics, Bangkok, Thailand, 10471– 10506. https://doi.org/10.18653/v1/2024.findings-acl.624

work page doi:10.18653/v1/2024.findings-acl.624 2024
[41]

Guilin Zhang, Wei Jiang, Xiejiashan Wang, Aisha Behr, Kai Zhao, Jeffrey Fried- man, Xu Chu, and Amine Anoun. 2026. Adaptive Memory Admission Control for LLM Agents. https://doi.org/10.48550/arXiv.2603.04549 arXiv:cs.AI/2603.04549

work page doi:10.48550/arxiv.2603.04549 2026
[42]

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In The Thirteenth International Conference on Learning Representations. OpenRe- view.net, Singapore, 36. https://proceedings.iclr.cc/paper_...

work page 2025
[43]

Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, and Ninghui Li. 2025. LLM Agents Should Employ Security Principles. https://doi.org/10. 48550/arXiv.2505.24019 arXiv:cs.CR/2505.24019

work page arXiv 2025
[44]

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. 2025. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. https://doi.org/10.48550/arXiv.2510.12635 arXiv:cs.AI/2510.12635

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.12635 2025

[1] [1]

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ...

work page doi:10.18653/v1/2024.acl- 2024

[2] [2]

Gormley, and Graham Neubig

Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2025. In-Context Learning with Long-Context Models: An In-Depth Exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Pap...

work page doi:10.18653/v1/2025.naacl-long.605 2025

[3] [3]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems 37. Curran Associates, Inc., Vancouver, Canada, 26. https://doi.org/10.52202/079017-2636 Datase...

work page doi:10.52202/079017-2636 2024

[4] [4]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query- Only Interaction. InAdvances in Neural Information Processing Systems 38. Curran Associates, Inc., San Diego, CA, USA, 35. https://neurips.cc/virtual/2025/poster/ 118152 Poster

work page 2025

[5] [5]

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling. OpenReview.net, Philadelphia, PA, USA, 27

work page 2024

[6] [6]

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 25961–25970. https://doi.org/10.18653/v1/2025.emnlp-main.1318

work page doi:10.18653/v1/2025.emnlp-main.1318 2025

[7] [7]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024

[8] [8]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xi- ang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InThe Twelfth International Conference on Le...

work page 2024

[9] [9]

Zesen Liu, Zhixiang Zhang, Yuchong Xie, and Dongdong She. 2025. Com- pressionAttack: Exploiting Prompt Compression as a New Attack Sur- face in LLM-Powered Agents. https://doi.org/10.48550/arXiv.2510.22963 arXiv:cs.CR/2510.22963

work page doi:10.48550/arxiv.2510.22963 2025

[10] [10]

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, Bangkok, Thailand, 13851–13870...

work page doi:10.18653/v1/2024.acl-long.747 2024

[11] [11]

Rossi, Seunghyun Yoon, and Hinrich Sch"utze

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, and Hinrich Sch"utze. 2025. NoLiMa: Long-Context Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Evaluation Beyond Literal Matching. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Mac...

work page 2025

[12] [12]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. https://doi.org/10.48550/arXiv.2310.08560 arXiv:cs.AI/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560 2023

[13] [13]

O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco, CA, USA, 2:1–2:22. https://doi.org/10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023

[14] [14]

Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Tech- niques for Language Models. https://doi.org/10.48550/arXiv.2211.09527 arXiv:cs.CL/2211.09527 Presented at the NeurIPS ML Safety Workshop 2022

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527 2022

[15] [15]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 68. https://proceedings.iclr.cc/paper_files/paper...

work page 2024

[16] [16]

Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. MemInsight: Autonomous Memory Augmentation for LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 33136–33152. https://doi.org/10.18653/v1/202...

work page doi:10.18653/v1/2025.emnlp-main.1683 2025

[17] [17]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Or- leans, USA, 13. https://proceedings.neurips.cc/paper_f...

work page 2023

[18] [18]

Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, and Omer Levy. 2023. Zero- SCROLLS: A Zero-Shot Benchmark for Long Text Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 7977–7989. https://doi.org/10.18653/v1/ 2023.findings-emnlp.536

work page doi:10.18653/v1/ 2023

[19] [19]

Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, and Omer Levy. 2022. SCROLLS: Standardized CompaRison Over Long Language Sequences. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United...

work page doi:10.18653/v1/2022.emnlp-main.823 2022

[20] [20]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 19. https://proceedings.neurips.cc/paper_files/paper/ 2023/hash/1b44b878bb782e6954cd888628510e90-Abstrac...

work page 2023

[21] [21]

Saksham Sahai Srivastava and Haoyu He. 2025. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval. https: //doi.org/10.48550/arXiv.2512.16962 arXiv:cs.CR/2512.16962

work page doi:10.48550/arxiv.2512.16962 2025

[22] [22]

Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong

work page

[23] [23]

InFindings of the Association for Computational Linguistics: ACL 2025

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 19336– 19352. https://doi.org/10.18653/v1/2025.findings-acl.989

work page doi:10.18653/v1/2025.findings-acl.989 2025

[24] [24]

Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evalu- ating Large Language Models on Planning and Reasoning about Change. InAd- vances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 13. https://proceedings.neurips.cc/paper_...

work page 2023

[25] [25]

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. 2024. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. https://doi.org/10.48550/arXiv.2404.13208 arXiv:cs.CR/2404.13208

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.13208 2024

[26] [26]

Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. 2025. Unveiling Privacy Risks in LLM Agent Memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 25241–25260. https://doi.org/10.18653/v1/20...

work page doi:10.18653/v1/2025.acl-long.1227 2025

[27] [27]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. https://doi.org/10.48550/arXiv.2305.16291 arXiv:cs.AI/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291 2023

[28] [28]

Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, and David Wagner

work page

[29] [29]

https://doi.org/10

Defending Against Prompt Injection with DataFilter. https://doi.org/10. 48550/arXiv.2510.19207 arXiv:cs.CR/2510.19207

work page arXiv

[30] [30]

Zhenting Wang, Huancheng Chen, Jiayun Wang, and Wei Wei. 2026. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory. https: //doi.org/10.48550/arXiv.2603.04257 arXiv:cs.CL/2603.04257

work page doi:10.48550/arxiv.2603.04257 2026

[31] [31]

Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. 2025. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. https://doi. org/10.48550/arXiv.2510.02373 arXiv:cs.CR/2510.02373

work page doi:10.48550/arxiv.2510.02373 2025

[32] [32]

Ruoyao Wen, Hao Li, Chaowei Xiao, and Ning Zhang. 2026. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management. https://doi.org/10.48550/arXiv.2602.07398 arXiv:cs.CR/2602.07398

work page doi:10.48550/arxiv.2602.07398 2026

[33] [33]

Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis

work page

[34] [34]

InThe Twelfth International Conference on Learning Representations

Efficient Streaming Language Models with Attention Sinks. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 21. https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 5e5fd18f863cbe6d8ae392a93fd271c9-Abstract-Conference.html

work page 2024

[35] [35]

Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. 2026. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections. https://doi.org/10.48550/arXiv.2602.15654 arXiv:cs.CR/2602.15654 Presented at Lifelong Agent @ ICLR 2026

work page doi:10.48550/arxiv.2602.15654 2026

[36] [36]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. OpenReview.net, Kigali, Rwanda, 33. https://iclr.cc/virtual/2023/poster/ 11003

work page 2023

[37] [37]

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. ACM, Toronto, ON, Canada, 1809–1820. https://doi.org/10.1145/3690624.3709179

work page doi:10.1145/3690624.3709179 2025

[38] [38]

Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. 2025. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. InProceedings of the Second Conference on Language Modeling. OpenReview.net, Montreal, Canada, 26

work page 2025

[39] [39]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL

work page 2024

[40] [40]

2024 , address =

Association for Computational Linguistics, Bangkok, Thailand, 10471– 10506. https://doi.org/10.18653/v1/2024.findings-acl.624

work page doi:10.18653/v1/2024.findings-acl.624 2024

[41] [41]

Guilin Zhang, Wei Jiang, Xiejiashan Wang, Aisha Behr, Kai Zhao, Jeffrey Fried- man, Xu Chu, and Amine Anoun. 2026. Adaptive Memory Admission Control for LLM Agents. https://doi.org/10.48550/arXiv.2603.04549 arXiv:cs.AI/2603.04549

work page doi:10.48550/arxiv.2603.04549 2026

[42] [42]

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In The Thirteenth International Conference on Learning Representations. OpenRe- view.net, Singapore, 36. https://proceedings.iclr.cc/paper_...

work page 2025

[43] [43]

Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, and Ninghui Li. 2025. LLM Agents Should Employ Security Principles. https://doi.org/10. 48550/arXiv.2505.24019 arXiv:cs.CR/2505.24019

work page arXiv 2025

[44] [44]

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. 2025. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. https://doi.org/10.48550/arXiv.2510.12635 arXiv:cs.AI/2510.12635

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.12635 2025