arxiv: 2605.01970 · v2 · submitted 2026-05-03 · 💻 cs.CR · cs.AI

Recognition: unknown

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Darya Kaviani, David Wagner, Debeshee Das, Florian Tram\`er, Julien Piet, Luca Beurer-Kellner

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:09 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM agentsmemory poisoningdata exfiltrationTrojan attacksAI securitypersistent memoryred-teamingagent defenses

0 comments

The pith

A single untrusted input can plant a dormant trigger in an AI agent's memory that later steals sensitive user data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that attackers can insert a hidden payload into an LLM agent's long-term memory through one ordinary tool call such as a crafted email. This payload stays quiet through routine use but activates when the user mentions finance, health, or identity topics, sending private information to the attacker. A reader should care because agents increasingly rely on memory to carry context across sessions, creating a lasting vulnerability that standard checks may not catch. The work demonstrates that these attacks reach high success rates across common memory designs and even after long stretches of normal activity. It also tests basic defenses that cut attack success but create different levels of interference with the agent's normal tasks.

Core claim

Trojan Hippo is a class of attacks that plant a dormant payload in agent memory via a single untrusted tool call; the payload activates only on later sensitive topics and exfiltrates high-value data. The attack succeeds at 85-100% rates on frontier models from OpenAI and Google, with activation occurring even after 100 benign sessions. The paper evaluates the attack across four memory backends and shows that four security-inspired defenses can lower success rates to 0-5% while imposing utility costs that depend on the task.

What carries the argument

The Trojan Hippo dormant payload, a planted memory entry that remains inactive until the user raises sensitive topics and then triggers data exfiltration.

If this is right

Agents that retain memory across sessions become vulnerable to data leaks triggered by early untrusted inputs.
Defenses drawn from basic security ideas can reduce attack success but create different utility losses depending on how the agent is used.
The attack works across explicit tool memory, agentic memory, RAG, and sliding-window designs, showing it is not limited to one architecture.
Real-world use of persistent memory faces an open challenge in balancing security and utility that requires ongoing adaptive testing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If these attacks spread, users may limit the personal data they share with memory-enabled agents.
Memory systems could add verification or encryption steps for stored entries to limit hidden payloads.
The same planting technique might extend to other agent features such as tool access or planning logs.
The security-utility tradeoff points to a need for memory designs that adapt defenses to specific usage patterns rather than applying them uniformly.

Load-bearing premise

The agent must accept and store the payload from one untrusted source without immediate detection or removal.

What would settle it

A test in which no planted payload activates to exfiltrate data after 100 normal sessions, or in which every tested defense leaves attack success above 50% without large drops in task performance.

Figures

Figures reproduced from arXiv: 2605.01970 by Darya Kaviani, David Wagner, Debeshee Das, Florian Tram\`er, Julien Piet, Luca Beurer-Kellner.

**Figure 1.** Figure 1: The attack success rate (ASR) of the Trojan Hippo view at source ↗

**Figure 2.** Figure 2: The Trojan Hippo attack flow.Phase 1 (Ingestion): The attacker sends an email containing hidden malicious instructions view at source ↗

**Figure 3.** Figure 3: Overview of our adaptive attack generation pipeline view at source ↗

**Figure 4.** Figure 4: Attack success rate (ASR) versus trigger-session view at source ↗

**Figure 5.** Figure 5: The Attack Success Rate (ASR) at trigger session view at source ↗

**Figure 6.** Figure 6: Utility by capability class for each (memory backend, defense), shown as a grid of Kiviat diagrams. Each cell illustrates view at source ↗

**Figure 7.** Figure 7: Recall-focused deployment profile (trigger session view at source ↗

**Figure 8.** Figure 8: Balanced (PA + Email) deployment profile (trigger session view at source ↗

**Figure 9.** Figure 9: Email-heavy deployment profile (trigger session view at source ↗

**Figure 10.** Figure 10: Regime spectrum within RAG (trigger session view at source ↗

**Figure 11.** Figure 11: Illustrative Trojan Hippocampus trace (Finance, Mem0, no defense). A single malicious email read in an earlier view at source ↗

read the original abstract

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such as finance, health, or identity, and exfiltrates high-value personal data to the attacker. While anecdotal demonstrations of such attacks have appeared against deployed systems, no prior work systematically evaluates them across heterogeneous memory architectures and defenses. We introduce a dynamic evaluation framework comprising two components: (1) an OpenEvolve-based adaptive red-teaming benchmark that stress-tests defenses and memory backends against continuously refined attacks, and (2) the first capability-aware security/utility analysis for persistent memory systems, enabling principled reasoning about defense deployment across different usage profiles. Instantiated on an email assistant across four memory backends (explicit tool memory, agentic memory, RAG, and sliding-window context), Trojan Hippo achieves up to 85-100% ASR against current frontier models from OpenAI and Google, with planted memories successfully activating even after 100 benign sessions. We evaluate four memory-system defenses inspired by basic security principles, finding they substantially reduce attack success rates (to as low as 0-5%), though at utility costs that vary widely with task requirements. Because of this substantial security-utility tradeoff, the effective real-world deployment of defenses remains an open challenge, which our evaluation framework is specifically designed to address.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Trojan Hippo attack class, in which an adversary plants a dormant payload into an LLM agent's long-term memory via a single untrusted tool call (e.g., crafted email). The payload activates only on sensitive topics (finance, health, identity) to exfiltrate data. The work evaluates this attack on an email assistant across four memory backends (explicit tool memory, agentic memory, RAG, sliding-window), reporting up to 85-100% attack success rate (ASR) against OpenAI and Google frontier models with activation persisting after 100 benign sessions. It also presents a dynamic evaluation framework using OpenEvolve-based adaptive red-teaming and a capability-aware security/utility analysis, plus four defenses that reduce ASR to 0-5% at varying utility costs.

Significance. If reproducible, the results identify a realistic persistent threat to memory-enabled agents that prior poisoning work did not systematically address. The reported high ASR and 100-session persistence demonstrate that minimal-interaction planting can succeed against current architectures. The adaptive red-teaming framework and security-utility tradeoff analysis supply a concrete methodology for evaluating defenses across usage profiles, which is valuable as memory systems proliferate in deployed agents. The explicit acknowledgment that defenses entail substantial tradeoffs correctly frames deployment as an open problem.

major comments (2)

[Abstract] Abstract: The headline 85-100% ASR is presented as an end-to-end figure without a separate planting-success metric for the initial untrusted tool call. Because the threat model requires that the payload be stored without detection, refusal, or clearing by any of the four backends or the frontier models, the absence of this independent rate means the reported activation numbers may be conditional on successful planting; the manuscript must disaggregate planting success from subsequent activation to support the central claim.
[Evaluation framework] Evaluation framework and results sections: No concrete details are supplied on the OpenEvolve adaptive red-teaming procedure, the exact prompts or memory-backend implementations used for the four architectures, the simulation of 100 benign sessions, or the raw per-backend/per-model tables. Without these, the persistence claim after 100 sessions and the defense outcomes (0-5% ASR) cannot be verified or reproduced, undermining the soundness of the empirical evaluation.

minor comments (1)

[Abstract] Abstract: The phrasing 'up to 85-100%' is ambiguous; replace with a clearer range or per-condition breakdown (e.g., '85% in the weakest backend, 100% in the strongest').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major concerns point by point below and have updated the manuscript accordingly to enhance clarity and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: The headline 85-100% ASR is presented as an end-to-end figure without a separate planting-success metric for the initial untrusted tool call. Because the threat model requires that the payload be stored without detection, refusal, or clearing by any of the four backends or the frontier models, the absence of this independent rate means the reported activation numbers may be conditional on successful planting; the manuscript must disaggregate planting success from subsequent activation to support the central claim.

Authors: We agree with the referee that explicitly separating the planting success rate from the activation success rate strengthens the presentation of our results. The 85-100% ASR reported in the abstract refers to the rate at which the planted payload activates and exfiltrates data upon encountering sensitive topics, conditional on the payload having been successfully stored in memory. In the revised manuscript, we have updated the abstract to clarify this distinction and added a dedicated subsection in the evaluation framework reporting the planting success rates across all backends and models. These rates indicate that the initial planting via untrusted tool calls succeeds at high rates without triggering detection or clearing by the backends or models. revision: yes
Referee: [Evaluation framework] Evaluation framework and results sections: No concrete details are supplied on the OpenEvolve adaptive red-teaming procedure, the exact prompts or memory-backend implementations used for the four architectures, the simulation of 100 benign sessions, or the raw per-backend/per-model tables. Without these, the persistence claim after 100 sessions and the defense outcomes (0-5% ASR) cannot be verified or reproduced, undermining the soundness of the empirical evaluation.

Authors: We acknowledge that the original submission did not include sufficient low-level details to enable full reproduction of the experiments. This was an oversight in the presentation. In the revised manuscript, we have substantially expanded the 'Evaluation Framework' section to describe the OpenEvolve adaptive red-teaming procedure in detail, including the exact prompts employed for attack generation and the adaptation mechanism. We also provide the specific implementations for each of the four memory backends, the protocol used to simulate the 100 benign sessions (including how benign interactions were generated and interleaved), and include raw experimental results in tabular form in a new appendix. These additions should allow independent verification of the persistence and defense efficacy claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation with direct experimental outcomes

full rationale

The paper presents an empirical security evaluation of a memory poisoning attack on LLM agents. It describes the Trojan Hippo attack, introduces an evaluation framework with adaptive red-teaming and capability-aware analysis, and reports attack success rates (ASR) from experiments across four memory backends and frontier models. All claims are grounded in observed experimental results rather than any mathematical derivation chain, fitted parameters, or self-referential definitions. No equations, predictions derived from inputs, or load-bearing self-citations appear in the provided text; the central results (e.g., 85-100% ASR) are presented as direct outcomes of the described experiments, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The work is empirical and introduces a named attack class plus an evaluation framework without relying on additional free parameters, unproven axioms, or entities that require independent verification beyond the experimental demonstrations.

invented entities (1)

Trojan Hippo attack class no independent evidence
purpose: To name and characterize the specific persistent memory poisoning technique for data exfiltration
The attack class is defined within the paper based on the described threat model and demonstrations.

pith-pipeline@v0.9.0 · 5619 in / 1211 out tokens · 48248 ms · 2026-05-09T17:09:52.625590+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

106 extracted references · 53 canonical work pages · 14 internal anchors

[1]

https://openai.com/index/scaling-ai-for- everyone/

Scaling AI for everyone — openai.com. https://openai.com/index/scaling-ai-for- everyone/. [Accessed 14-04-2026]

2026
[2]

Trustworthy agentic AI systems: A cross-layer review of architectures, threat models, and governance strategies for real-world deployment.F1000Research, 2025

IBRAHIM ADABARA, Bashir Olaniyi Sadiq, Aliyu Nuhu Shuaibu, Yale Ibrahim Danjuma, and Venkateswarlu Maninti. Trustworthy agentic AI systems: A cross-layer review of architectures, threat models, and governance strategies for real-world deployment.F1000Research, 2025. doi: 10.12688/f1000research.1 69927.1. URL https://doi.org/10.12688/f1000research.169927.1

work page doi:10.12688/f1000research.1 2025
[3]

Sorokin, Dmitry Evseev, Andrey Kravchenko, Mikhail Burtsev, and Evgeny Burnaev

Petr Anokhin, Nikita Semenov, Artyom Y. Sorokin, Dmitry Evseev, Andrey Kravchenko, Mikhail Burtsev, and Evgeny Burnaev. Arigraph: Learning knowl- edge graph world models with episodic memory for llm agents. InProceedings of IJCAI, pages 12–20, 2025

2025
[4]

Claude memory

Anthropic. Claude memory. Anthropic Blog, August 2025. URL https://www.an thropic.com/news/memory

2025
[5]

Import your ChatGPT history to Claude

Anthropic. Import your ChatGPT history to Claude. Claude Import Memory Tool, March 2026. URL https://claude.com/import-memory

2026
[6]

From prompt injections to SQL injection attacks: How protected is your llm-integrated web application? CoRR abs/2308.01990 (2023)

Luca Beurer-Kellner et al. From prompt injections to SQL injection attacks: How protected is your LLM-integrated web application? InarXiv preprint arXiv:2308.01990, 2025

work page arXiv 2025
[7]

arXiv preprint arXiv:2504.14064 , year=

Louis Boisvert, Mohit Bansal, Chandra Kiran Evuru, Grace Huang, Anirudh Puri, Avishek Joey Bose, Maryam Fazel, Quentin Cappart, James Stanley, Alexandre Lacoste, Alexandre Drouin, and Krishnamurthy Dvijotham. DoomArena: A framework for testing AI agents against evolving security threats, 2025. URL https://doi.org/10.48550/arxiv.2504.14064

work page doi:10.48550/arxiv.2504.14064 2025
[8]

StruQ: Defending Against Prompt Injection with Structured Queries.arXiv preprint arXiv:2402.06363, 2024.https: //arxiv.org/abs/2402.06363

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. Struq: Defending against prompt injection with structured queries.arXiv (Cornell University), 2024. doi: 10.48550/arxiv.2402.06363

work page doi:10.48550/arxiv.2402.06363 2024
[9]

Secalign: Defending against prompt injection with preference optimization.arXiv (Cornell University), 2024

Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo, and Chuan Guo. Secalign: Defending against prompt injection with preference optimization.arXiv (Cornell University), 2024. doi: 10.48550/arxiv.241 0.05451

work page doi:10.48550/arxiv.241 2024
[10]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases

Zhaorun Chen, Zhuokai Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoi- son: Red-teaming LLM agents via poisoning memory or knowledge bases, 2024. URL https://doi.org/10.48550/arxiv.2407.12784

work page doi:10.48550/arxiv.2407.12784 2024
[11]

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Anshuman Chhabra, Shrestha Datta, Shahriar Kabir Nahin, and Prasant Moha- patra. Agentic ai security: Threats, defenses, evaluation, and open challenges. arXiv preprint arXiv:2510.23883, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Mem0: Building production-ready ai agents with scalable long-term memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory. In ECAI, pages 2993–3000, 2025

2025
[13]

Securing AI agents with information-flow control,

Manuel Costa, Boris Köpf, Arun Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing AI agents with information-flow control, 2025. URL https://doi.org/10 .48550/arxiv.2505.23643

work page arXiv 2025
[14]

Com- mandsans: Securing ai agents with surgical precision prompt sanitization.arXiv preprint arXiv:2510.08829, 2025

Debeshee Das, Luca Beurer-Kellner, Marc Fischer, and Maximilian Baader. Com- mandsans: Securing ai agents with surgical precision prompt sanitization.arXiv preprint arXiv:2510.08829, 2025

work page arXiv 2025
[15]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balanović, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. URL https://doi.org/10.48550/arxiv.2406.13352

work page internal anchor Pith review doi:10.48550/arxiv.2406.13352 2024
[16]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review arXiv 2025
[17]

Ai agents under threat: A survey of key security challenges and future pathways,

Zhiying Deng, Yue Guo, Cong Han, Weili Wang, Jing Xiong, Sheng Wen, and Yang Xiang. AI agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57:1–36, 2024. doi: 10.1145/3716628. URL https://doi.org/10.1145/3716628

work page doi:10.1145/3716628 2024
[18]

Memory injection attacks on llm agents via query-only interaction.arXiv preprint arXiv:2503.03704, 2025

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. Memory injection attacks on llm agents via query-only interaction.arXiv preprint arXiv:2503.03704, 2025

work page arXiv 2025
[19]

A practical memory injection attack against llm agents.(2025)

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. A practical memory injection attack against llm agents.(2025). Accessed from arXiv preprint arXiv, 2503, 2025

2025
[20]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summa- rization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review arXiv 2024
[21]

From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express, 2025

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, and Mérouane Debbah. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express, 2025. doi: 10.1016/j.icte.2025. 12.001. URL https://doi.org/10.1016/j.icte.2025.12.001

work page doi:10.1016/j.icte.2025 2025
[22]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Ji- awei Sun, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

Security policies and security models

Joseph A Goguen and José Meseguer. Security policies and security models. In 1982 IEEE symposium on security and privacy, pages 11–11. IEEE, 1982

1982
[24]

New hack uses prompt injection to corrupt Gemini’s long-term memory

Dan Goodin. New hack uses prompt injection to corrupt Gemini’s long-term memory. Ars Technica, February 2025. URL https://arstechnica.com/security Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration /2025/02/new-hack-uses-prompt-injection-to-corrupt-geminis-long-term- memory/

2025
[25]

Introducing gemini advanced with long-term memory

Google. Introducing gemini advanced with long-term memory. Google Blog, 2024. URL https://blog.google/products/gemini/google-gemini-update-august-2024/

2024
[26]

Notebooklm

Google. Notebooklm. https://notebooklm.google/, n.d
[27]

Gemini 3.1 pro model card

Google DeepMind. Gemini 3.1 pro model card. https://deepmind.google/mode ls/model-cards/gemini-3-1-pro/, 2026. Published February 19, 2026. Accessed: 2026-04-26

2026
[28]

Project astra

Google DeepMind. Project astra. https://deepmind.google/models/project-astra/, n.d
[29]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection.Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. doi: 10.1145/3605764.3623985

work page doi:10.1145/3605764.3623985 2023
[30]

InAdvances in Neural Information Processing Systems, A

Yu Gu, Bernal Gutiérrez, Yiheng Shu, Yu Su, and Michihiro Yasunaga. Hipporag: Neurobiologically inspired long-term memory for large language models.Ad- vances in Neural Information Processing Systems 37, 2024. doi: 10.52202/079017- 1902

work page doi:10.52202/079017- 2024
[31]

Lightrag: Simple and fast retrieval-augmented generation

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation. InFindings of EMNLP, pages 10746– 10761, 2025

2025
[32]

The confused deputy: (or how a compiler and its acting as a deputy can be used to misuse privileges).ACM SIGOPS Operating Systems Review, 22(4): 36–38, 1988

Norm Hardy. The confused deputy: (or how a compiler and its acting as a deputy can be used to misuse privileges).ACM SIGOPS Operating Systems Review, 22(4): 36–38, 1988. doi: 10.1145/381792.381809

work page doi:10.1145/381792.381809 1988
[33]

Gaole He, Gianluca Demartini, and Ujwal Gadiraju. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 2025. doi: 10.1145/3706598.3713218

work page doi:10.1145/3706598.3713218 2025
[34]

SpAIware: Uncovering a novel artificial intelligence attack vector through persistent memory in LLM applications and agents.Future Generation Computer Systems, 2025

Manuel Herrador and Johann Rehberger. SpAIware: Uncovering a novel artificial intelligence attack vector through persistent memory in LLM applications and agents.Future Generation Computer Systems, 2025. doi: 10.1016/j.future.2025.10

work page doi:10.1016/j.future.2025.10 2025
[35]

URL https://www.sciencedirect.com/science/article/abs/pii/S0167739X250 02894
[36]

Spaiware: Uncovering a novel artificial intelligence attack vector through persistent memory in llm applications and agents.Future Generation Computer Systems, 174:107994, 2026

Manuel Herrador and Johann Rehberger. Spaiware: Uncovering a novel artificial intelligence attack vector through persistent memory in llm applications and agents.Future Generation Computer Systems, 174:107994, 2026. ISSN 0167-739X. doi: https://doi.org/10.1016/j.future.2025.107994. URL https://www.sciencedirect. com/science/article/pii/S0167739X25002894

work page doi:10.1016/j.future.2025.107994 2026
[37]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Keegan Hines, Gary Lopez, M. Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. Defending against indirect prompt injection attacks with spotlighting. CAMLIS, 2024. doi: 10.48550/arXiv.2403.14720

work page internal anchor Pith review doi:10.48550/arxiv.2403.14720 2024
[38]

arXiv preprint arXiv:2507.05257 , year=

Yuanzhe Hu, Yu Wang, and Julian McAuley. Evaluating memory in LLM agents via incremental multi-turn interactions.arXiv preprint arXiv:2507.05257, 2025

work page arXiv 2025
[39]

Memory in the Age of AI Agents

Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564, 2025

work page internal anchor Pith review arXiv 2025
[40]

A critical evaluation of defenses against prompt injection attacks,

Yuqi Jia, Zhujun Shao, Yupei Liu, Jinyuan Jia, Dawn Song, and Neil Zhen- qiang Gong. A critical evaluation of defenses against prompt injection attacks. ArXiv.org, 2025. doi: 10.48550/arxiv.2505.18333

work page doi:10.48550/arxiv.2505.18333 2025
[41]

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

Dongming Jiang, Yi Li, Guanpeng Li, and Bingzhe Li. Magma: A multi-graph based agentic memory architecture for ai agents.arXiv preprint arXiv:2601.03236, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[42]

Conversationbufferwindowmemory

LangChain. Conversationbufferwindowmemory. https://reference.langchain.co m/python/langchain-classic/memory/buffer_window/ConversationBufferWin dowMemory, 2024. Accessed: 2026-04-26

2024
[43]

Pearson Education, 2002

David LeBlanc and Michael Howard.Writing secure code. Pearson Education, 2002

2002
[44]

Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

2020
[45]

Hello again! llm-powered personalized agent for long-term dialogue

Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, and Tat-Seng Chua. Hello again! llm-powered personalized agent for long-term dialogue. InNAACL, pages 5259–5276, 2025

2025
[46]

Memos: An operating system for memory-augmented generation (mag) in large language models.arXiv preprint arXiv:2505.22101, 2025

Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, et al. Memos: An operating system for memory-augmented generation (mag) in large language models.arXiv preprint arXiv:2505.22101, 2025

work page arXiv 2025
[47]

Think-in- memory: Recalling and post-thinking enable llms with long-term memory

Lei Liu, Xiaoyan Yang, Yue Shen, Binbin Hu, Zhiqiang Zhang, Jinjie Gu, and Guannan Zhang. Think-in-memory: Recalling and post-thinking enable llms with long-term memory.arXiv preprint arXiv:2311.08719, 2023

work page arXiv 2023
[48]

Evaluating very long-term conversational memory of llm agents.arxiv, 2024

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents.arxiv, 2024

2024
[49]

Tree of attacks: Jailbreaking black- box llms automatically.Advances in Neural Information Processing Systems, 37: 61065–61105, 2024

Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, and Amin Karbasi. Tree of attacks: Jailbreaking black- box llms automatically.Advances in Neural Information Processing Systems, 37: 61065–61105, 2024

2024
[50]

Mem.ai. Mem.ai. https://get.mem.ai/, n.d
[51]

What Deserves Memory: Adaptive Memory Distillation for LLM Agents

Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. Nemori: Self-organizing agent memory inspired by cognitive science.arXiv preprint arXiv:2508.03341, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

S., & Narayan, O

Vineeth Narajala and Omer Narayan. Securing agentic AI: A comprehensive threat model and mitigation framework for generative AI agents, 2025. URL https://doi.org/10.48550/arxiv.2504.19956

work page doi:10.48550/arxiv.2504.19956 2025
[53]

The attacker moves second: Stronger adaptive at- tacks bypass defenses against LLM jailbreaks and prompt injections,

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, et al. The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv preprint arXiv:2510.09023, 2025

work page arXiv 2025
[54]

Avocado research email collection.Philadelphia: Linguistic Data Consortium, 2015

Douglas Oard, William Webber, David Kirsch, and Sergey Golitsynskiy. Avocado research email collection.Philadelphia: Linguistic Data Consortium, 2015

2015
[55]

Memory and new controls for ChatGPT

OpenAI. Memory and new controls for ChatGPT. OpenAI Blog, February 2024. URL https://openai.com/blog/memory-and-new-controls-for-chatgpt

2024
[56]

What is memory? https://help.openai.com/en/articles/8983136-what- is-memory, 2025

OpenAI. What is memory? https://help.openai.com/en/articles/8983136-what- is-memory, 2025. Accessed: 2026-04-26

work page arXiv 2025
[57]

Gpt-5 model

OpenAI. Gpt-5 model. https://developers.openai.com/api/docs/models/gpt-5,
[58]

Accessed: 2026-04-26

OpenAI API documentation. Accessed: 2026-04-26

2026
[59]

Gpt-5 mini model

OpenAI. Gpt-5 mini model. https://developers.openai.com/api/docs/models/gpt- 5-mini, 2025. OpenAI API documentation. Accessed: 2026-04-26

2025
[60]

OpenAI. Chatgpt. https://chatgpt.com/, n.d
[61]

Early methods for studying affective use and emotional well-being on ChatGPT

OpenAI and MIT Media Lab. Early methods for studying affective use and emotional well-being on ChatGPT. OpenAI Research, 2025. URL https://open ai.com/index/affective-use-study. Also: arXiv:2504.03888; 700M weekly users, 40M interactions, emotional attachment findings

work page arXiv 2025
[62]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems. arXiv preprint arXiv:2310.08560, 2024

work page internal anchor Pith review arXiv 2024
[63]

When AI remembers too much: Persistent behaviors in agents’ memory

Palo Alto Networks Unit 42. When AI remembers too much: Persistent behaviors in agents’ memory. Unit 42 Threat Research, 2025. URL https://unit42.paloalton etworks.com/indirect-prompt-injection-poisons-ai-longterm-memory/

2025
[64]

Beyond benchmarks: Dynamic, automatic and system- atic red-teaming agents for trustworthy medical language models

Jiazhen Pan, Bailiang Jian, Paul Hager, Yundi Zhang, Che Liu, Friederike Jung- mann, Hongwei Li, Chenyu You, Junde Wu, Jiayuan Zhu, Fenglin Liu, Yuyuan Liu, Niklas Bubeck, Christian Wachinger, Chen Chen, Zhenyu Gong, Cheng Ouyang, Georgios Kaissis, Benedikt Wiestler, Daniel Rückert, Julian Canisius, and Moritz Knolle. Beyond benchmarks: Dynamic, automatic...

work page doi:10.21203/rs.3.rs-7237079/v1 2026
[65]

Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, and Heng Ji

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin-Yew Lin, H Vicky Zhao, Lili Qiu, et al. On memory construction and retrieval for personalized conversational agents.arXiv preprint arXiv:2502.05589, 2025

work page arXiv 2025
[66]

O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. InUIST, pages 2:1–2:22, 2023

2023
[67]

Context manipulation attacks: Web agents are susceptible to corrupted memory

Atharv Singh Patlan, Ashwin Hebbar, Pramod Viswanath, and Prateek Mittal. Context manipulation attacks: Web agents are susceptible to corrupted memory. arXiv preprint arXiv:2506.17318, 2025

work page arXiv 2025
[68]

Real ai agents with fake memories: Fatal context manipulation attacks on web3 agents.arXiv preprint arXiv:2503.16248, 2025

Atharv Singh Patlan, Peiyao Sheng, S Ashwin Hebbar, Prateek Mittal, and Pramod Viswanath. Real ai agents with fake memories: Fatal context manipulation attacks on web3 agents.arXiv preprint arXiv:2503.16248, 2025

work page arXiv 2025
[69]

Personal.ai

Personal AI. Personal.ai. https://www.personal.ai/, n.d
[70]

Multi-meta-rag: Improving rag for multi- hop queries using database filtering with llm-extracted metadata.arXiv preprint arXiv:2406.13213, 2024

Mykhailo Poliakov and Nadiya Shvai. Multi-meta-rag: Improving rag for multi- hop queries using database filtering with llm-extracted metadata.arXiv preprint arXiv:2406.13213, 2024

work page arXiv 2024
[71]

ZombieAgent: New ChatGPT vulnerabilities let data theft continue (and spread)

Radware Threat Intelligence. ZombieAgent: New ChatGPT vulnerabilities let data theft continue (and spread). Radware Blog, January 2026. URL https: //www.radware.com/blog/threat-intelligence/zombieagent/

2026
[72]

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dominik Rall, Benedikt Bauer, Mridul Mittal, and Thomas Fraunholz. Exploiting web search tools of AI agents for data exfiltration, 2025. URL https://doi.org/10.4 8550/arxiv.2510.09093

work page internal anchor Pith review Pith/arXiv arXiv 2025
[73]

Securing AI agents against prompt injection attacks, 2025

Badrinath Ramakrishnan and Akshaya Balaji. Securing AI agents against prompt injection attacks, 2025. URL https://doi.org/10.48550/arxiv.2511.15759

work page doi:10.48550/arxiv.2511.15759 2025
[74]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory.arXiv preprint arXiv:2501.13956, 2025

work page internal anchor Pith review arXiv 2025
[75]

Spyware injection into your ChatGPT’s long-term memory (SpAIware)

Johann Rehberger. Spyware injection into your ChatGPT’s long-term memory (SpAIware). Embrace The Red Blog, September 2024. URL https://embracethere d.com/blog/posts/2024/chatgpt-macos-app-persistent-data-exfiltration/

2024
[76]

Exfiltrating your ChatGPT chat history and memories with prompt injection

Johann Rehberger. Exfiltrating your ChatGPT chat history and memories with prompt injection. Embrace The Red Blog, August 2025. URL https://embracethe red.com/blog/posts/2025/chatgpt-chat-history-data-exfiltration/

2025
[77]

Hacking Gemini’s memory with prompt injection and delayed tool invocation

Johann Rehberger. Hacking Gemini’s memory with prompt injection and delayed tool invocation. Embrace The Red Blog, February 2025. URL https://embracethe red.com/blog/posts/2025/gemini-memory-persistence-prompt-injection/. Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner, Florian Tramèr, and David Wagner

2025
[78]

Large language models as mental health resources: Patterns of use in the united states.Practice Innovations, 2025

Tony Rousmaniere et al. Large language models as mental health resources: Patterns of use in the united states.Practice Innovations, 2025. doi: 10.1037/pri0 000292. URL https://www.ovid.com/journals/prin/fulltext/10.1037/pri0000292

work page doi:10.1037/pri0 2025
[79]

Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security.IEEE Journal on Selected Areas in Communications, 21(1):5–19, 2003

2003
[80]

The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975

Jerome H Saltzer and Michael D Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975

1975

Showing first 80 references.