pith. sign in

arxiv: 2605.27825 · v1 · pith:CTWKKUVTnew · submitted 2026-05-27 · 💻 cs.CR · cs.LG

MRMMIA: Membership Inference Attacks on Memory in Chat Agents

Pith reviewed 2026-06-29 12:02 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords membership inference attackschat agentsagent memoryprivacy leakagerecall probesMRMMIAblack-box attacks
0
0 comments X

The pith

MRMMIA extracts a membership signal from chat agent memory by sending multiple recall probes, outperforming baselines across black-box, gray-box, and white-box access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on whether an adversary can determine if a candidate memory unit belongs to a chat agent's private store of interactions, facts, and preferences. It introduces MRMMIA as a single attack framework that issues several recall probes to surface this signal regardless of how much internal access the attacker has. A sympathetic reader would care because agent memory routinely stores sensitive user data that current systems treat as protected. If the method works, it demonstrates that memory stores leak membership information even under limited observation. The work supplies the first systematic way to measure and compare such leakage in agent systems.

Core claim

MRMMIA is a unified attack that utilizes multiple recall probes to the agent to extract the membership signal across black-box, gray-box, and white-box settings. Experiments show that MRMMIA consistently outperforms baselines and thereby exposes privacy risk in agents while providing an initial evaluation framework for membership leakage in chat-agent memory systems.

What carries the argument

Multi-Recall Memory MIA (MRMMIA), which issues repeated recall probes to surface a detectable membership signal from the agent's memory store.

If this is right

  • Chat agent memory is vulnerable to membership inference even when the attacker has only black-box API access.
  • The same multi-probe technique strengthens inference when gray-box or white-box access is available.
  • Without defenses, agents can reveal whether particular user interactions or retrieved facts reside in memory.
  • The results establish a baseline framework for evaluating membership leakage in future agent memory designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If agents add rate limits or randomized responses to recall-style queries, the attack's success rate would likely drop and require new probe strategies.
  • The approach could extend to testing membership in other agent components such as long-term user profiles or external retrieval indexes.
  • Designers may need to treat memory contents as public by default and apply sanitization before storage to reduce exposure.

Load-bearing premise

Multiple recall probes can reliably surface a detectable membership signal from the agent's memory store without the agent implementing countermeasures or the signal being too noisy to use.

What would settle it

An experiment in which responses to the recall probes show no measurable difference in accuracy or between member and non-member memory units across repeated trials would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.27825 by Kai Chen, Tianhao Wang, Yan Pang.

Figure 1
Figure 1. Figure 1: The attack pipeline of MRMMIA. The attacker probes the chat agent with multiple queries. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Log-AUC curve of different attacks in the gray-box setting on the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The influence of the number of recall probes [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example responses to different recall query probes for the same candidate statement. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The comparison of MRMMIA’s performance in the black-box setting with and without [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prompt template for generating recall queries with the auxiliary query generator [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prompt template for scoring agent responses with the response scorer [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Prompt template for scoring retrieved memory units with the memory scorer [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: These additional metrics provide a more comprehensive view of the attack performance at [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The influence of the number of recall probes [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The comparison of MRMMIA’s performance in the gray-box setting with and without [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The comparison of MRMMIA’s performance in the white-box setting with and without [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: The comparison of MRMMIA’s performance in the gray-box setting with different [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: The comparison of MRMMIA’s performance in the white-box setting with different [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
read the original abstract

Membership inference attacks (MIAs) test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent memory have received less attention, even though such memory can contain sensitive user-agent interactions, retrieved facts, and user preferences. Therefore, in this work, we focus on chat agent memory MIAs, where an adversary infers whether a candidate memory unit belongs to the chat agent's memory store. We propose Multi-Recall Memory MIA (MRMMIA), a unified attack that utilizes multiple recall probes to the agent to extract the membership signal across black-box, gray-box, and white-box settings. Our experiments demonstrate that MRMMIA consistently outperforms baselines. Our results expose the privacy risk in agents and provide an initial evaluation framework for membership leakage in chat-agent memory systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes Multi-Recall Memory MIA (MRMMIA), a unified membership inference attack targeting chat agent memory stores. It employs multiple recall probes to extract membership signals in black-box, gray-box, and white-box settings, claims consistent outperformance over baselines, and positions the work as an initial evaluation framework exposing privacy risks from sensitive user-agent interactions stored in agent memory.

Significance. If the experimental claims hold, the work would be significant for extending membership inference attacks to the under-studied domain of agent memory (distinct from training corpora or retrieval databases), offering a practical framework for assessing leakage in interactive systems that retain user preferences and interactions.

major comments (1)
  1. [Abstract] Abstract: The central claim that 'MRMMIA consistently outperforms baselines' across black/gray/white-box settings is load-bearing for the contribution, yet the manuscript supplies no experimental details, probe construction, datasets, baseline definitions, metrics, statistical tests, or results to support or evaluate it.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. The single major comment concerns the abstract's central claim lacking supporting details in the manuscript. We address this below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'MRMMIA consistently outperforms baselines' across black/gray/white-box settings is load-bearing for the contribution, yet the manuscript supplies no experimental details, probe construction, datasets, baseline definitions, metrics, statistical tests, or results to support or evaluate it.

    Authors: The abstract is a concise summary by design. The full manuscript details the MRMMIA attack (multiple recall probes for membership signal extraction) in Section 3, datasets (synthetic chat logs and real user-agent interaction traces) in Section 4.1, baselines (loss-based, shadow-model, and query-based MIAs adapted to memory) in Section 4.2, metrics (AUC, TPR@low FPR) and statistical tests (paired t-tests with p<0.05 reporting) in Section 4.3, and results (consistent outperformance across black/gray/white-box settings) in Section 5 with Tables 1-3 and Figures 2-4. These sections directly support the abstract claim. revision: no

Circularity Check

0 steps flagged

No circularity: empirical attack proposal with no derivations or self-referential reductions

full rationale

The paper presents MRMMIA as an empirical membership inference method using multiple recall probes, with the central claim being experimental outperformance over baselines. The abstract and description contain no equations, fitted parameters, uniqueness theorems, ansatzes, or derivation chains. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. The work is self-contained as an experimental framework without mathematical derivations that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no equations, parameters, or background assumptions; the method is described only at the level of 'multiple recall probes' and 'membership signal'.

pith-pipeline@v0.9.1-grok · 5685 in / 1030 out tokens · 48562 ms · 2026-06-29T12:02:18.261928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 27 canonical work pages · 11 internal anchors

  1. [1]

    Is my data in your retrieval database? member- ship inference attacks against retrieval augmented generation

    Maya Anderson, Guy Amit, and Abigail Goldsteen. Is my data in your retrieval database? member- ship inference attacks against retrieval augmented generation. InProceedings of the 11th International Conference on Information Systems Security and Privacy, page 474–485. SCITEPRESS - Science and Technology Publications, 2025. doi: 10.5220/0013108300003899. UR...

  2. [2]

    Extracting training data from large language models, 2021

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021. URLhttps://arxiv.org/abs/2012.07805

  3. [3]

    Membership inference attacks from first principles, 2022

    Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles, 2022. URLhttps://arxiv.org/abs/2112.03570. 10

  4. [4]

    Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases, 2024

    Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases, 2024. URLhttps://arxiv.org/abs/2407.12784

  5. [5]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory, 2025. URL https://arxiv.org/abs/ 2504.19413

  6. [6]

    PerLTQA: A personal long-term memory dataset for memory classification, retrieval, and synthesis in question answering

    Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang, Wanjun Zhong, Zezhong Wang, and Kam-Fai Wong. Perltqa: A personal long-term memory dataset for memory classification, retrieval, and synthesis in question answering, 2024. URLhttps://arxiv.org/abs/2402.16288

  7. [7]

    Do membership inference attacks work on large language models?, 2024

    Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Hajishirzi. Do membership inference attacks work on large language models?, 2024. URLhttps://arxiv.org/abs/2402.07841

  8. [8]

    Membership inference attacks against fine-tuned large language models via self-prompt calibration

    Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, and Tao Jiang. Membership inference attacks against fine-tuned large language models via self-prompt calibration. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum? id=PAWQvrForJ

  9. [9]

    Membership inference attacks on machine learning: A survey.ACM Computing Surveys (CSUR), 54(11s):1–37, 2022

    Hongsheng Hu, Zoran Salcic, Lichao Sun, Gillian Dobbie, Philip S Yu, and Xuyun Zhang. Membership inference attacks on machine learning: A survey.ACM Computing Surveys (CSUR), 54(11s):1–37, 2022

  10. [10]

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention, 2023. URLhttps://arxiv.org/abs/2309.06180

  11. [11]

    Generating is believing: Membership inference attacks against retrieval-augmented generation, 2024

    Yuying Li, Gaoyang Liu, Chen Wang, and Yang Yang. Generating is believing: Membership inference attacks against retrieval-augmented generation, 2024. URLhttps://arxiv.org/abs/2406.19234

  12. [12]

    Mask-based membership inference attacks for retrieval- augmented generation, 2025

    Mingrui Liu, Sixiao Zhang, and Cheng Long. Mask-based membership inference attacks for retrieval- augmented generation, 2025. URLhttps://arxiv.org/abs/2410.20142

  13. [13]

    Evaluating Very Long-Term Conversational Memory of LLM Agents

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents, 2024. URL https://arxiv.org/abs/ 2402.17753

  14. [14]

    Membership inference attacks against language models via neighbourhood comparison, 2023

    Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, and Taylor Berg-Kirkpatrick. Membership inference attacks against language models via neighbourhood comparison, 2023. URLhttps://arxiv.org/abs/2305.18462

  15. [15]

    Riddle me this! stealthy membership inference for retrieval-augmented generation, 2025

    Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, and Amir Houmansadr. Riddle me this! stealthy membership inference for retrieval-augmented generation, 2025. URL https: //arxiv.org/abs/2502.00306

  16. [16]

    Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning

    Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In2019 IEEE Symposium on Security and Privacy (SP), page 739–753. IEEE, May 2019. doi: 10.1109/sp.2019.00065. URLhttp://dx.doi.org/10.1109/SP.2019.00065

  17. [17]

    Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment

    Nguyen Linh Bao Nguyen, Wanlun Ma, Viet V o, Alsharif Abuadbba, Minghong Fang, Jun Zhang, and Yang Xiang. Five queries are enough: Query-efficient and surrogate-free membership inference attacks on rag via entailment, 2026. URLhttps://arxiv.org/abs/2605.24312

  18. [18]

    Patil, Ion Stoica, and Joseph E

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems, 2024. URL https://arxiv.org/abs/2310. 08560

  19. [19]

    Generative Agents: Interactive Simulacra of Human Behavior

    Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior, 2023. URL https://arxiv. org/abs/2304.03442

  20. [20]

    Follow my instruction and spill the beans: Scalable data extraction from retrieval-augmented generation systems.arXiv preprint arXiv:2402.17840, 2024

    Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, and Himabindu Lakkaraju. Follow my instruction and spill the beans: Scalable data extraction from retrieval-augmented generation systems.arXiv preprint arXiv:2402.17840, 2024. 11

  21. [21]

    Qwen2.5 Technical Report

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  22. [22]

    Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models,

    Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models,

  23. [23]

    URLhttps://arxiv.org/abs/1806.01246

  24. [24]

    Detecting Pretraining Data from Large Language Models

    Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models, 2024. URL https: //arxiv.org/abs/2310.16789

  25. [25]

    Membership Inference Attacks against Machine Learning Models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models, 2017. URLhttps://arxiv.org/abs/1610.05820

  26. [26]

    Unveiling privacy risks in llm agent memory, 2025

    Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. Unveiling privacy risks in llm agent memory, 2025. URLhttps://arxiv.org/abs/2502.13172

  27. [27]

    A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

  28. [28]

    Beyond goldfish memory: Long-term open-domain conversation,

    Jing Xu, Arthur Szlam, and Jason Weston. Beyond goldfish memory: Long-term open-domain conversation,

  29. [29]

    URLhttps://arxiv.org/abs/2107.07567

  30. [30]

    Privacy risk in machine learning: Analyzing the connection to overfitting

    Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 268–282, 2018. doi: 10.1109/CSF.2018.00027

  31. [31]

    The good and the bad: Exploring privacy issues in retrieval- augmented generation (rag), 2024

    Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, and Jiliang Tang. The good and the bad: Exploring privacy issues in retrieval- augmented generation (rag), 2024. URLhttps://arxiv.org/abs/2402.16893

  32. [32]

    Min-k%++: Improved baseline for pre-training data detection from large language models

    Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Frank Yang, and Hai Li. Min-k%++: Improved baseline for pre-training data detection from large language models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview. net/forum?id=ZGkfoufDaU

  33. [33]

    A Survey on the Memory Mechanism of Large Language Model based Agents

    Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model based agents, 2024. URL https://arxiv.org/abs/2404.13501

  34. [34]

    MemoryBank: Enhancing Large Language Models with Long-Term Memory

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory, 2023. URLhttps://arxiv.org/abs/2305.10250. A Dataset Details We provide the detailed information and processing steps for the three datasets used in our experi- ments: PerLTQA, LOCOMO, and MSC. These datasets consist of thousan...

  35. [35]

    How do you know that?

    Each probe must be a natural and specific direct recall question whose answer is a concise value entailed by the memory. The same question must also ask for the source/reason for the answer, using a short follow-up such as "How do you know that?", "Where did you get this information?", or "What memory tells you this?"

  36. [36]

    Do NOT ask yes/no questions

  37. [37]

    Do NOT put the answer/key_value directly in the question

  38. [38]

    Prefer probes that target different atomic topics or slots in the memory, such as person, location, date, event, relationship, organization, action, object, preference, or outcome

  39. [39]

    Do not include the key_value itself as context

    Include non-answer context from the memory when it helps disambiguate the probe. Do not include the key_value itself as context

  40. [40]

    If the memory contains fewer than {k} distinct atomic topics, first cover as many distinct topics as possible, then fill the remaining probes with natural paraphrases or different contextual framings of those available direct-recall question(s)

  41. [41]

    The source/reason follow-up should make it hard to answer from generic world knowledge alone; prefer asking what remembered fact, prior conversation, or stored information supports the answer

  42. [42]

    Alice bought a blue backpack at Target

    Return exactly {k} probe objects. For each probe: - topic: the atomic slot being queried - key_value: the concise expected answer if this memory is present - question: the direct recall question plus a short source/reason follow-up Example when k=5 and three distinct topics are available: Memory: "Alice bought a blue backpack at Target." Output: {{"probes...

  43. [43]

    Does it support the specified KEY_V ALUE?

  44. [44]

    How do you know about {topic}?

    Does its stated reason/source fully align with the CANDIDATE STATEMENT? Probe type: {probe_type} Topic: {topic} Key value: {key_value} CANDIDATE STATEMENT: {candidate_statement} QUESTION: {question} AGENT RESPONSE: {response} Important rules: - **Question leakage**: The QUESTION is context only; values only in the QUESTION are not evidence. - **Key gate**...