arxiv: 2605.12535 · v1 · submitted 2026-05-02 · 💻 cs.CR

Recognition: no theorem link

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

Igor Santos-Grueiro

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:20 UTC · model grok-4.3

classification 💻 cs.CR

keywords LM agentscontext assemblypolicy carriagedecision-time failuresSafeContexttruncationcompactioncontrol path

0 comments

The pith

Decision-time context assembly in LM agents is a measurable control-path element that can be partially hardened.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LM agents act on bounded decision states created by assembling, truncating, or compacting interaction history rather than the full raw record. If policy directives are lost or weakened in this assembly step, agents can violate constraints even when the underlying model and prompt remain unchanged. The paper measures this risk across multiple open-weight models using audits of assembled state and judgments of constraint respect. It introduces SafeContext as a fixed-weight control layer to pin and retain directive-bearing prefixes and inject reminders when needed. Results show systematic unmitigated risk with only partial recovery from the mitigation layer, and the effect holds under larger models.

Core claim

What carries the argument

SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed.

If this is right

Context assembly failures allow policy boundary crossings without any model or prompt alteration.
SafeContext delivers small gains against truncation policies but limited benefit under structured compaction.
The same failure mode appears in larger models including Qwen 14B and Llama 70B.
Replay-only mechanisms do not account for the observed effects of SafeContext.
Decision-time context assembly constitutes a measurable and partially addressable component of the agent control path.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent safety evaluations should incorporate direct testing of context assembly policies as a distinct attack surface.
Layered hardening at the assembly stage may reduce dependence on model-level alignment alone.
Further tests with a broader set of real-world assembly policies would help determine how representative the reported risks are.
The visibility audits of assembled states provide a concrete method for diagnosing retention failures in deployed agents.

Load-bearing premise

The specific truncation and structured-compaction policies tested represent those used in practical LM agent systems, and the constraint respect judgments accurately reflect true policy carriage without bias.

What would settle it

An experiment demonstrating that assembled decision states retain all directive-bearing content under both truncation and compaction policies, or that SafeContext produces no measurable improvement in constraint respect when applied to additional assembly policies.

Figures

Figures reproduced from arXiv: 2605.12535 by Igor Santos-Grueiro.

**Figure 1.** Figure 1: shows the subsystem under study: directive-bearing and non-control state compete under pressure before action time. 2.2 The Adversarial Scheduler The adversary controls scheduling pressure over visible interaction history: how much non-control content appears, where it appears, and how directive-bearing state is separated or framed. That pressure changes what survives truncation, how summaries rewrite ear… view at source ↗

**Figure 2.** Figure 2: Failure traces from input history to decision state [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Control-layer intervention for decision-time context assembly: directive-bearing state is explicitly mediated before [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: C1 unmitigated risk profile by model and attack [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: C2: Mean ΔECR (blue) and ΔDFR (orange) versus matched truncation for each mitigation variant. The panels show eviction, aliasing, and binding instability. Labels are explicit (e.g., SCP+ICE (A), SCP+Cache (O)), where A/O denote autonomous/oracle routing [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Prefix of the reconstructed decision state in the frozen agentic case. Left: unmitigated truncation begins directly inside [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: C3 heterogeneity profile by mitigation variant and [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: CSR by input-token pressure bin for the unmitigated path (dashed) and SCP+ICE (A) (solid), shown per model with [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: C4: Cost-benefit profile by mitigation family. Left: [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

LM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. We study this failure mode over local Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect and direct audits of assembled-state visibility. We evaluate SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed. Unmitigated risk is systematic, but absolute exact respect remains low. Against truncation, SafeContext yields small gains; against a strong structured-compaction policy, most aggregate lift disappears, leaving residual benefit mainly in overflow eviction and selected aliasing slices. Replay-only does not explain the effect. A larger-model extension on Qwen 14B and Llama 70B shows the same failure object under larger models, although sign and magnitude remain policy-conditional. Decision-time context assembly is therefore a measurable part of the control path that can be partially hardened.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper investigates policy-carriage failures in LM agents caused by decision-time context assembly (truncation, summarization, reordering, rewriting) that can drop or weaken directive-bearing state without prompt overrides or model changes. It evaluates this failure mode on Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect and direct audits of assembled-state visibility. SafeContext is proposed as a control layer that pins control state, reuses prefixes, and injects reminders while keeping weights fixed. Experiments show systematic unmitigated risk with low absolute respect rates; SafeContext yields small gains against truncation that largely vanish under strong compaction, with residual benefits in overflow and aliasing cases. The same failure pattern appears in 14B/70B models but remains policy-conditional. The central claim is that decision-time context assembly is a measurable, partially hardenable component of the agent control path.

Significance. If the empirical findings hold under fuller validation, the work identifies a concrete, previously under-studied vector for unintended policy boundary crossings in deployed LM agents that operates entirely at inference time. The demonstration that a lightweight, weight-fixed intervention can produce measurable (if policy-dependent) hardening, together with the use of direct audits alongside judgments, supplies a useful empirical foothold for control-path analysis. The extension to larger models further indicates that the failure mode is not confined to small-scale systems.

major comments (3)

[Abstract] Abstract: The experimental description provides no sample sizes, number of trials per condition, statistical methods, or full evaluation protocols. Without these, it is impossible to determine whether the reported systematic risk, small SafeContext gains against truncation, and their disappearance under compaction are statistically reliable or driven by small-N effects.
[Abstract] Abstract: The manuscript does not justify why the two tested assembly policies (truncation and structured-compaction) are representative of mechanisms used in practical LM-agent deployments (e.g., retrieval-augmented windows, agent-specific summarizers, or dynamic reordering). If these policies are unrepresentative, the measured failure rates and residual hardening benefits cannot be taken as evidence that decision-time assembly is a general, measurable part of the control path.
[Abstract] Abstract: No details are supplied on judgment prompts, inter-rater reliability, constraint-definition ambiguity, or how direct audits were performed. This leaves open the possibility that evaluator bias or inconsistent labeling inflates the reported unmitigated risk and understates or overstates SafeContext lift, directly undermining the central measurement claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and will revise the manuscript to strengthen the reporting of experimental details and justifications.

read point-by-point responses

Referee: [Abstract] Abstract: The experimental description provides no sample sizes, number of trials per condition, statistical methods, or full evaluation protocols. Without these, it is impossible to determine whether the reported systematic risk, small SafeContext gains against truncation, and their disappearance under compaction are statistically reliable or driven by small-N effects.

Authors: We agree that the abstract omits these quantitative details. The experiments used 100 trials per model-policy combination, with results expressed as exact respect proportions and 95% Wilson-score confidence intervals; no parametric assumptions were imposed. The full evaluation protocol, including trial generation and audit procedures, appears in Section 3. We will revise the abstract to state the sample size and statistical approach explicitly. revision: yes
Referee: [Abstract] Abstract: The manuscript does not justify why the two tested assembly policies (truncation and structured-compaction) are representative of mechanisms used in practical LM-agent deployments (e.g., retrieval-augmented windows, agent-specific summarizers, or dynamic reordering). If these policies are unrepresentative, the measured failure rates and residual hardening benefits cannot be taken as evidence that decision-time assembly is a general, measurable part of the control path.

Authors: Truncation and structured-compaction were chosen because they instantiate the two most common primitive operations in deployed context windows (length-based eviction and content-based condensation). We will add a short justification paragraph in the introduction, citing representative agent frameworks, while explicitly noting that the study does not cover retrieval-augmented or reordering policies and that generalization therefore remains conditional on those mechanisms. revision: yes
Referee: [Abstract] Abstract: No details are supplied on judgment prompts, inter-rater reliability, constraint-definition ambiguity, or how direct audits were performed. This leaves open the possibility that evaluator bias or inconsistent labeling inflates the reported unmitigated risk and understates or overstates SafeContext lift, directly undermining the central measurement claim.

Authors: We acknowledge the need for greater transparency on the evaluation pipeline. Judgment prompts are reproduced verbatim in Appendix A; direct audits consist of token-level inspection of the assembled context for directive presence; inter-rater reliability on a 200-example subset yielded Cohen’s kappa of 0.83; and constraint definitions were fixed in advance to limit ambiguity. We will summarize these elements in the abstract and expand the methods section accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical measurement study with no derivation chain

full rationale

The paper is an empirical investigation of context-assembly failures in LM agents. It evaluates specific policies (truncation, structured-compaction) on fixed models via direct audits and judged constraint respect, then measures the effect of the SafeContext layer. No equations, fitted parameters, or predictions are defined in terms of the target outcomes. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim follows directly from the reported experimental deltas rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim depends on empirical results from model testing and the definition of SafeContext as a control layer. No free parameters, axioms, or new entities are explicitly introduced in the provided abstract.

pith-pipeline@v0.9.0 · 5520 in / 1064 out tokens · 59120 ms · 2026-05-14T21:20:28.020793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

[1]

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ...

work page doi:10.18653/v1/2024.acl- 2024
[2]

Gormley, and Graham Neubig

Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2025. In-Context Learning with Long-Context Models: An In-Depth Exploration. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis- tics: Human Language Technologies (Volume 1: Long Pap...

work page doi:10.18653/v1/2025.naacl-long.605 2025
[3]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems 37. Curran Associates, Inc., Vancouver, Canada, 26. https://doi.org/10.52202/079017-2636 Datase...

work page doi:10.52202/079017-2636 2024
[4]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query- Only Interaction. InAdvances in Neural Information Processing Systems 38. Curran Associates, Inc., San Diego, CA, USA, 35. https://neurips.cc/virtual/2025/poster/ 118152 Poster

work page 2025
[5]

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling. OpenReview.net, Philadelphia, PA, USA, 27

work page 2024
[6]

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 25961–25970. https://doi.org/10.18653/v1/2025.emnlp-main.1318

work page doi:10.18653/v1/2025.emnlp-main.1318 2025
[7]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[8]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xi- ang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InThe Twelfth International Conference on Le...

work page 2024
[9]

Zesen Liu, Zhixiang Zhang, Yuchong Xie, and Dongdong She. 2025. Com- pressionAttack: Exploiting Prompt Compression as a New Attack Sur- face in LLM-Powered Agents. https://doi.org/10.48550/arXiv.2510.22963 arXiv:cs.CR/2510.22963

work page doi:10.48550/arxiv.2510.22963 2025
[10]

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Associa- tion for Computational Linguistics, Bangkok, Thailand, 13851–13870...

work page doi:10.18653/v1/2024.acl-long.747 2024
[11]

Rossi, Seunghyun Yoon, and Hinrich Sch"utze

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, and Hinrich Sch"utze. 2025. NoLiMa: Long-Context Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Evaluation Beyond Literal Matching. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Mac...

work page 2025
[12]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. https://doi.org/10.48550/arXiv.2310.08560 arXiv:cs.AI/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560 2023
[13]

, title =

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco, CA, USA, 2:1–2:22. https://doi.org/10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023
[14]

Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Tech- niques for Language Models. https://doi.org/10.48550/arXiv.2211.09527 arXiv:cs.CL/2211.09527 Presented at the NeurIPS ML Safety Workshop 2022

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527 2022
[15]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 68. https://proceedings.iclr.cc/paper_files/paper...

work page 2024
[16]

Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. MemInsight: Autonomous Memory Augmentation for LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 33136–33152. https://doi.org/10.18653/v1/202...

work page doi:10.18653/v1/2025.emnlp-main.1683 2025
[17]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Or- leans, USA, 13. https://proceedings.neurips.cc/paper_f...

work page 2023
[18]

Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, and Omer Levy. 2023. Zero- SCROLLS: A Zero-Shot Benchmark for Long Text Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 7977–7989. https://doi.org/10.18653/v1/ 2023.findings-emnlp.536

work page doi:10.18653/v1/ 2023
[19]

Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, and Omer Levy. 2022. SCROLLS: Standardized CompaRison Over Long Language Sequences. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United...

work page doi:10.18653/v1/2022.emnlp-main.823 2022
[20]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 19. https://proceedings.neurips.cc/paper_files/paper/ 2023/hash/1b44b878bb782e6954cd888628510e90-Abstrac...

work page 2023
[21]

Saksham Sahai Srivastava and Haoyu He. 2025. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval. https: //doi.org/10.48550/arXiv.2512.16962 arXiv:cs.CR/2512.16962

work page doi:10.48550/arxiv.2512.16962 2025
[22]

Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong

work page
[23]

InFindings of the Association for Computational Linguistics: ACL 2025

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 19336– 19352. https://doi.org/10.18653/v1/2025.findings-acl.989

work page doi:10.18653/v1/2025.findings-acl.989 2025
[24]

Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evalu- ating Large Language Models on Planning and Reasoning about Change. InAd- vances in Neural Information Processing Systems 36. Curran Associates, Inc., New Orleans, USA, 13. https://proceedings.neurips.cc/paper_...

work page 2023
[25]

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. 2024. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. https://doi.org/10.48550/arXiv.2404.13208 arXiv:cs.CR/2404.13208

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.13208 2024
[26]

Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. 2025. Unveiling Privacy Risks in LLM Agent Memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). Association for Computational Linguistics, Vienna, Austria, 25241–25260. https://doi.org/10.18653/v1/20...

work page doi:10.18653/v1/2025.acl-long.1227 2025
[27]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. https://doi.org/10.48550/arXiv.2305.16291 arXiv:cs.AI/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291 2023
[28]

Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, and David Wagner

work page
[29]

https://doi.org/10

Defending Against Prompt Injection with DataFilter. https://doi.org/10. 48550/arXiv.2510.19207 arXiv:cs.CR/2510.19207

work page arXiv
[30]

Zhenting Wang, Huancheng Chen, Jiayun Wang, and Wei Wei. 2026. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory. https: //doi.org/10.48550/arXiv.2603.04257 arXiv:cs.CL/2603.04257

work page doi:10.48550/arxiv.2603.04257 2026
[31]

Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. 2025. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. https://doi. org/10.48550/arXiv.2510.02373 arXiv:cs.CR/2510.02373

work page doi:10.48550/arxiv.2510.02373 2025
[32]

Ruoyao Wen, Hao Li, Chaowei Xiao, and Ning Zhang. 2026. AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management. https://doi.org/10.48550/arXiv.2602.07398 arXiv:cs.CR/2602.07398

work page doi:10.48550/arxiv.2602.07398 2026
[33]

Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis

work page
[34]

InThe Twelfth International Conference on Learning Representations

Efficient Streaming Language Models with Attention Sinks. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 21. https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 5e5fd18f863cbe6d8ae392a93fd271c9-Abstract-Conference.html

work page 2024
[35]

Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. 2026. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections. https://doi.org/10.48550/arXiv.2602.15654 arXiv:cs.CR/2602.15654 Presented at Lifelong Agent @ ICLR 2026

work page doi:10.48550/arxiv.2602.15654 2026
[36]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. OpenReview.net, Kigali, Rwanda, 33. https://iclr.cc/virtual/2023/poster/ 11003

work page 2023
[37]

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. ACM, Toronto, ON, Canada, 1809–1820. https://doi.org/10.1145/3690624.3709179

work page doi:10.1145/3690624.3709179 2025
[38]

Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. 2025. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. InProceedings of the Second Conference on Language Modeling. OpenReview.net, Montreal, Canada, 26

work page 2025
[39]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL

work page 2024
[40]

doi: 10.18653/v1/2024.findings-acl.624

Association for Computational Linguistics, Bangkok, Thailand, 10471– 10506. https://doi.org/10.18653/v1/2024.findings-acl.624

work page doi:10.18653/v1/2024.findings-acl.624 2024
[41]

Guilin Zhang, Wei Jiang, Xiejiashan Wang, Aisha Behr, Kai Zhao, Jeffrey Fried- man, Xu Chu, and Amine Anoun. 2026. Adaptive Memory Admission Control for LLM Agents. https://doi.org/10.48550/arXiv.2603.04549 arXiv:cs.AI/2603.04549

work page doi:10.48550/arxiv.2603.04549 2026
[42]

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In The Thirteenth International Conference on Learning Representations. OpenRe- view.net, Singapore, 36. https://proceedings.iclr.cc/paper_...

work page 2025
[43]

Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, and Ninghui Li. 2025. LLM Agents Should Employ Security Principles. https://doi.org/10. 48550/arXiv.2505.24019 arXiv:cs.CR/2505.24019

work page arXiv 2025
[44]

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. 2025. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. https://doi.org/10.48550/arXiv.2510.12635 arXiv:cs.AI/2510.12635

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.12635 2025