Recognition: no theorem link
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
Pith reviewed 2026-05-15 05:40 UTC · model grok-4.3
The pith
MemPrivacy replaces sensitive spans in user memory with type-aware placeholders for cloud processing while restoring originals locally to preserve utility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MemPrivacy identifies privacy-sensitive spans on edge devices, replaces them with semantically structured type-aware placeholders for cloud-side memory processing, and restores the original values locally when needed, thereby decoupling privacy protection from semantic destruction and retaining the information required for effective memory formation and retrieval.
What carries the argument
Type-aware placeholders that encode the category and relational structure of removed sensitive spans so cloud memory systems can still form, index, and retrieve memories without receiving raw private values.
If this is right
- Memory utility stays within 1.6 percent of unprotected baselines on multiple widely used memory systems.
- Privacy information extraction accuracy exceeds that of GPT-5.2 and Gemini-3.1-Pro on the new benchmark.
- Inference latency decreases relative to full-masking baselines.
- Protection strength is adjustable through the four-level privacy taxonomy.
- The approach scales to 200 users and more than 155k privacy instances without requiring changes to existing memory architectures.
Where Pith is reading between the lines
- The same placeholder pattern could be applied to other edge-cloud workloads such as personalized recommendation or conversation history where only certain fields must stay private.
- Automatically generated placeholder schemas tuned to each privacy level might further reduce the small remaining utility gap.
- Long-running user studies would reveal whether restored memories maintain personalization quality over weeks or months of interaction.
- Adoption in regulated domains such as health or finance would become simpler once the method is integrated into common agent runtimes.
Load-bearing premise
Type-aware placeholders still carry enough semantic context for the memory system to form, retrieve, and personalize memories after the original sensitive content is removed.
What would settle it
A head-to-head test on standard memory benchmarks in which utility loss exceeds 1.6 percent or privacy-span extraction accuracy falls below that of GPT-5.2 when the placeholder method is used.
read the original abstract
As LLM-powered agents are increasingly deployed in edge-cloud environments, personalized memory has become a key enabler of long-term adaptation and user-centric interaction. However, cloud-assisted memory management exposes sensitive user information, while existing privacy protection methods typically rely on aggressive masking that removes task-relevant semantics and consequently degrades memory utility and personalization quality. To address this challenge, We propose MemPrivacy, which identifies privacy-sensitive spans on edge devices, replaces them with semantically structured type-aware placeholders for cloud-side memory processing, and restores the original values locally when needed. By decoupling privacy protection from semantic destruction, MemPrivacy minimizes sensitive data exposure while retaining the information required for effective memory formation and retrieval. We also construct MemPrivacy-Bench for systematic evaluation, a dataset covering 200 users and over 155k privacy instances, and introduce a four-level privacy taxonomy for configurable protection policies. Experiments show that MemPrivacy achieves strong performance in privacy information extraction, substantially surpassing strong general-purpose models such as GPT-5.2 and Gemini-3.1-Pro, while also reducing inference latency. Across multiple widely used memory systems, MemPrivacy limits utility loss to within 1.6%, outperforming baseline masking strategies. Overall, MemPrivacy offers an effective balance between privacy protection and personalized memory utility for edge-cloud agents, enabling secure, practical, and user-transparent deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MemPrivacy, a system for privacy-preserving personalized memory management in edge-cloud LLM agents. Sensitive spans are identified on edge devices and replaced with type-aware placeholders (e.g., <PERSON>, <LOCATION>) for cloud-side memory encoding, indexing, and retrieval; originals are restored locally on demand. It introduces MemPrivacy-Bench (200 users, >155k privacy instances) and a four-level privacy taxonomy for configurable policies. Experiments claim strong privacy extraction performance that substantially exceeds GPT-5.2 and Gemini-3.1-Pro, reduced inference latency, and utility loss limited to 1.6% across multiple memory systems while outperforming aggressive masking baselines.
Significance. If the empirical results hold under rigorous verification, the work is significant for enabling practical deployment of personalized agents in edge-cloud settings by addressing the privacy-utility tradeoff without aggressive semantic destruction. The construction of MemPrivacy-Bench and the four-level taxonomy provides reusable evaluation infrastructure and policy knobs that future work can build upon. The reported outperformance in both privacy extraction and low utility degradation offers concrete evidence that type-aware replacement can be viable when the placeholders preserve task-relevant structure.
major comments (3)
- [§5] §5 (Experiments) and Table 3: The headline claim that utility loss is limited to 1.6% across memory systems is load-bearing for the central contribution, yet the section provides no error bars, statistical significance tests, or explicit description of how the 200-user MemPrivacy-Bench was partitioned into train/test splits or how the four memory systems were configured. Without these, it is impossible to determine whether the reported margin over masking baselines is robust or sensitive to dataset construction choices.
- [§3.2] §3.2 (Placeholder Design) and §4.1 (Memory Formation): The assumption that type-aware placeholders retain sufficient relational and attribute semantics for effective cloud-side memory retrieval and personalization is not supported by targeted ablations. No results are shown for high-ambiguity cases (e.g., repeated <PERSON> tokens without distinguishing attributes) or for retrieval metrics when placeholders replace context-critical spans; if such cases cause retrieval failures, the 1.6% utility bound would not generalize.
- [§5.3] §5.3 (Privacy Extraction): The claim of substantial outperformance over GPT-5.2 and Gemini-3.1-Pro on privacy information extraction is presented without baseline implementation details, prompt templates, or fine-tuning status. Because the evaluation uses the newly introduced MemPrivacy-Bench, it is unclear whether gains arise from specialization on the dataset rather than from the MemPrivacy architecture itself.
minor comments (3)
- [Abstract] The abstract and §1 refer to 'GPT-5.2' and 'Gemini-3.1-Pro'; these model names should be clarified (exact versions, access dates, or whether they are hypothetical stand-ins) to allow reproducibility.
- [Figure 2] Figure 2 (system overview) would benefit from explicit arrows showing the round-trip restoration of original values on the edge device after cloud retrieval.
- [§4.2] The four-level taxonomy in §4.2 is introduced without a table summarizing the exact privacy categories and their mapping to placeholder types; adding such a table would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to improve clarity, rigor, and reproducibility.
read point-by-point responses
-
Referee: [§5] §5 (Experiments) and Table 3: The headline claim that utility loss is limited to 1.6% across memory systems is load-bearing for the central contribution, yet the section provides no error bars, statistical significance tests, or explicit description of how the 200-user MemPrivacy-Bench was partitioned into train/test splits or how the four memory systems were configured. Without these, it is impossible to determine whether the reported margin over masking baselines is robust or sensitive to dataset construction choices.
Authors: We agree that the current presentation lacks sufficient statistical detail and methodological transparency. In the revised manuscript we will add error bars (standard deviation over multiple runs) to all metrics in Table 3, report results of paired statistical significance tests against the masking baselines, and provide an explicit description of the MemPrivacy-Bench partitioning strategy together with the precise configuration parameters used for each of the four memory systems. These additions will appear in §5 and the appendix. revision: yes
-
Referee: [§3.2] §3.2 (Placeholder Design) and §4.1 (Memory Formation): The assumption that type-aware placeholders retain sufficient relational and attribute semantics for effective cloud-side memory retrieval and personalization is not supported by targeted ablations. No results are shown for high-ambiguity cases (e.g., repeated <PERSON> tokens without distinguishing attributes) or for retrieval metrics when placeholders replace context-critical spans; if such cases cause retrieval failures, the 1.6% utility bound would not generalize.
Authors: We acknowledge that dedicated ablations on high-ambiguity and context-critical cases are missing. We will add a new set of experiments in the revision that isolate retrieval accuracy and end-to-end utility when multiple identical placeholders appear and when privacy spans are central to the downstream task. These results will be reported alongside the existing MemPrivacy-Bench numbers to confirm whether the 1.6% utility bound holds under these conditions. revision: yes
-
Referee: [§5.3] §5.3 (Privacy Extraction): The claim of substantial outperformance over GPT-5.2 and Gemini-3.1-Pro on privacy information extraction is presented without baseline implementation details, prompt templates, or fine-tuning status. Because the evaluation uses the newly introduced MemPrivacy-Bench, it is unclear whether gains arise from specialization on the dataset rather than from the MemPrivacy architecture itself.
Authors: We will include the exact prompt templates, API call parameters, and zero-shot evaluation protocol used for GPT-5.2 and Gemini-3.1-Pro in the revised §5.3 and Appendix B. The baselines were not fine-tuned on MemPrivacy-Bench; the reported gains therefore reflect the specialization of our edge-side extractor rather than data leakage. These details will make the comparison fully reproducible. revision: yes
Circularity Check
No significant circularity; claims rest on system design and experimental evaluation
full rationale
The paper introduces MemPrivacy as a system that identifies sensitive spans on edge devices, substitutes type-aware placeholders, and restores originals locally, then evaluates the approach on a newly constructed MemPrivacy-Bench dataset covering 200 users and 155k instances together with a four-level taxonomy. All reported outcomes—privacy extraction performance exceeding GPT-5.2 and Gemini-3.1-Pro, utility loss capped at 1.6% across memory systems, and latency reductions—are presented as direct experimental measurements rather than predictions derived from fitted parameters or self-referential definitions. No equations, uniqueness theorems, or ansatzes appear; the central claims therefore remain independent of the inputs they evaluate and do not reduce to them by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Privacy-sensitive spans can be reliably identified on edge devices using local models without access to full cloud context.
- domain assumption Type-aware placeholders retain sufficient semantic type information for downstream memory formation and retrieval.
invented entities (1)
-
type-aware placeholders
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Firewalls to secure dynamic llm agentic networks.arXiv preprint arXiv:2502.01822, 2025
Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian, Per Ola Kristensson, and Reza Shokri. Firewalls to secure dynamic llm agentic networks.arXiv preprint arXiv:2502.01822, 2025
-
[2]
Airgapagent: Protecting privacy-conscious conversational agents
Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, and Daniel Ramage. Airgapagent: Protecting privacy-conscious conversational agents. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 3868–3882, 2024
work page 2024
-
[3]
METEOR: An automatic metric for MT evaluation with improved correlation with human judgments
Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss, editors,Proceedings of the ACL Workshopon Intrinsic and Extrinsic Evaluation Measures for Machine Translationand/or Summarization, pages 65–72, Ann Arbor, Michigan, Ju...
work page 2005
-
[4]
Rahime Belen-Saglam, Jason RC Nurse, and Duncan Hodges. An investigation into the sensitivity of personal information and implications for disclosure: a uk perspective.Frontiersin Computer Science, 4:908245, 2022
work page 2022
-
[5]
Halumem: Evaluating hallucinations in memory systems of agents.arXiv preprint arXiv:2511.03506, 2025
Ding Chen, Simin Niu, Kehang Li, Peng Liu, Xiangping Zheng, Bo Tang, Xinchi Li, Feiyu Xiong, and Zhiyu Li. Halumem: Evaluating hallucinations in memory systems of agents.arXiv preprint arXiv:2511.03506, 2025
-
[6]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory, 2025. URLhttps://arxiv.org/abs/2504.19413
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
VortexPIA: Indirect prompt injection attack against LLMs for efficient extraction of user privacy
Yu Cui, Sicheng Pan, Yifei Liu, Haibin Zhang, and Cong Zuo. VortexPIA: Indirect prompt injection attack against LLMs for efficient extraction of user privacy. In Vera Demberg, Kentaro Inui, and Lluís Marquez, editors, Findings of the Association for Computational Linguistics: EACL 2026, pages 587–609, Rabat, Morocco, March
work page 2026
-
[8]
Association for Computational Linguistics. ISBN 979-8-89176-386-9. doi: 10.18653/v1/2026.findings-eacl.29. URLhttps://aclanthology.org/2026.findings-eacl.29/
-
[9]
Neurofilter: Privacy guardrails for conversational llm agents.arXiv preprint arXiv:2601.14660, 2026
Saswat Das and Ferdinando Fioretto. Neurofilter: Privacy guardrails for conversational llm agents.arXiv preprint arXiv:2601.14660, 2026
-
[10]
Unique in the crowd: The privacy bounds of human mobility.Scientific reports, 3(1):1376, 2013
Yves-Alexandre De Montjoye, César A Hidalgo, Michel Verleysen, and Vincent D Blondel. Unique in the crowd: The privacy bounds of human mobility.Scientific reports, 3(1):1376, 2013
work page 2013
-
[11]
David Erdos. Identification in personal data: Authenticating the meaning and reach of another broad concept in eu data protection law.Computer Law & Security Review, 46:105721, 2022
work page 2022
-
[12]
Scaling synthetic data creation with 1,000,000,000 personas, 2025
Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, and Dong Yu. Scaling synthetic data creation with 1,000,000,000 personas, 2025. URLhttps://arxiv.org/abs/2406.20094
-
[13]
Raphaël Gellert. Personal data’s ever-expanding scope in smart environments and possible path (s) for regulating emerging digital technologies.International Data Privacy Law, 11(2):196–208, 2021
work page 2021
-
[14]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
Wei Huang, Anda Cheng, Zhao Zhang, and Yinggui Wang. Dpf-cm: A data processing framework with privacy- preserving vector databases for chinese medical llms training and deployment.arXiv preprint arXiv:2509.01354, 2025
-
[16]
Bowen Jiang, Yuan Yuan, Maohao Shen, Zhuoqun Hao, Zhangchen Xu, Zichen Chen, Ziyi Liu, Anvesh Rao Vijjini, Jiashu He, Hanchao Yu, Radha Poovendran, Gregory Wornell, Lyle Ungar, Dan Roth, Sihao Chen, and Camillo Jose Taylor. Personamem-v2: Towards personalized intelligence via learning implicit user personas and agentic memory, 2025. URLhttps://arxiv.org/a...
-
[17]
MemReader: From Passive to Active Extraction for Long-Term Agent Memory
Jingyi Kang, Chunyu Li, Ding Chen, Bo Tang, Feiyu Xiong, and Zhiyu Li. Memreader: From passive to active extraction for long-term agent memory.arXiv preprint arXiv:2604.07877, 2026. 15
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Tatsuki Koga, Ruihan Wu, Zhiyuan Zhang, and Kamalika Chaudhuri. Privacy-preserving retrieval-augmented generation with differential privacy.arXiv preprint arXiv:2412.04697, 2024
-
[19]
Catching transparent phish: Analyzing and detecting mitm phishing toolkits
Brian Kondracki, Babak Amin Azad, Oleksii Starov, and Nick Nikiforakis. Catching transparent phish: Analyzing and detecting mitm phishing toolkits. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 36–50, 2021
work page 2021
-
[20]
Deepfakes, phrenology, surveillance, and more! a taxonomy of ai privacy risks
Hao-Ping Lee, Yu-Ju Yang, Thomas Serban Von Davier, Jodi Forlizzi, and Sauvik Das. Deepfakes, phrenology, surveillance, and more! a taxonomy of ai privacy risks. InProceedings of the 2024 CHI Conference on Human Factorsin Computing Systems, pages 1–19, 2024
work page 2024
-
[21]
Memos: A memory os for ai system
Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Jiawei Yang, Chunyu Li, Qingchen Yu, Jihao Zhao, Yezhaohui Wang, Peng Liu, Zehao Lin, Pengyuan Wang, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhen Tao, Junpeng Ren, Huayi Lai, Hao Wu, Bo Tang, Zhenren Wang, Zhaoxin Fan, Ningyu Zhang, Linfeng Zhang, Junchi Yan, Mingchuan...
-
[22]
ROUGE: A package for automatic evaluation of summaries
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InText Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URLhttps:// aclanthology.org/W04-1013/
work page 2004
-
[23]
Mememo: Evaluating emotion in memory systems of agents.arXiv preprint arXiv:2602.23944, 2026
Peng Liu, Zhen Tao, Jihao Zhao, Ding Chen, Yansong Zhang, Cuiping Li, Zhiyu Li, and Hong Chen. Mememo: Evaluating emotion in memory systems of agents.arXiv preprint arXiv:2602.23944, 2026
-
[24]
Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, and Mohsen Imani. Lune: Efficient llm unlearning via lora fine-tuning with negative examples.arXiv preprint arXiv:2512.07375, 2025
-
[25]
Jinglong Luo, Zhuo Zhang, Yehong Zhang, Shiyu Liu, Ye Dong, Hui Wang, Yue Yu, Xun Zhou, and Zenglin Xu. Secp-tuning: Efficient privacy-preserving prompt tuning for large language models via mpc.arXiv preprint arXiv:2506.15307, 2025
-
[26]
Whistledown: Combining user-level privacy with conversational coherence in llms
Chelsea McMurray and Hayder Tirmazi. Whistledown: Combining user-level privacy with conversational coherence in llms. arXiv preprint arXiv:2511.13319, 2025
-
[27]
According to me: Long-term personalized referential memory qa, 2026
Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, and Bill Byrne. According to me: Long-term personalized referential memory qa, 2026. URLhttps://arxiv.org/abs/2603.01990
-
[28]
Cimemories: A compositional benchmark for contextual integrity of persistent memory in llms, 2025
Niloofar Mireshghallah, Neal Mangaokar, Narine Kokhlikyan, Arman Zharmagambetov, Manzil Zaheer, Saeed Mahloujifar, and Kamalika Chaudhuri. Cimemories: A compositional benchmark for contextual integrity of persistent memory in llms, 2025. URLhttps://arxiv.org/abs/2511.14937
-
[29]
Privacybench: A conversational benchmark for evaluating privacy in personalized ai, 2025
Srija Mukhopadhyay, Sathwik Reddy, Shruthi Muthukumar, Jisun An, and Ponnurangam Kumaraguru. Privacybench: A conversational benchmark for evaluating privacy in personalized ai, 2025. URL https: //arxiv.org/abs/2512.24848
-
[30]
Robust de-anonymization of large sparse datasets
Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In2008 IEEE Symposium on Security and Privacy (sp 2008), pages 111–125. IEEE, 2008
work page 2008
-
[31]
Privacy as contextual integrity.Wash.L
Helen Nissenbaum. Privacy as contextual integrity.Wash.L. Rev., 79:119, 2004
work page 2004
-
[32]
Training language models to follow instructions with human feedback
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advancesin neural information processing systems, 35:27730–27744, 2022
work page 2022
-
[33]
Memgpt: towards llms as operating systems
Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: towards llms as operating systems. 2023
work page 2023
-
[34]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors,Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computa...
-
[35]
Vaidehi Patil, Peter Hase, and Mohit Bansal. Can sensitive information be deleted from llms? objectives for defending against extraction attacks.arXiv preprint arXiv:2309.17410, 2023. 16
-
[36]
Paul Quinn and Gianclaudio Malgieri. The difficulty of defining sensitive data—the concept of sensitive data in the eu data protection framework.German Law Journal, 22(8):1583–1612, 2021
work page 2021
-
[37]
Zero: Memory optimizations toward training trillion parameter models
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimizations toward training trillion parameter models. InSC20: internationalconference for high performance computing, networking,storage and analysis, pages 1–16. IEEE, 2020
work page 2020
-
[38]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
The pii problem: Privacy and a new concept of personally identifiable information
Paul M Schwartz and Daniel J Solove. The pii problem: Privacy and a new concept of personally identifiable information. NYUL rev., 86:1814, 2011
work page 2011
-
[40]
Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, and Diyi Yang. Privacylens: Evaluating privacy norm awareness of language models in action.Advancesin Neural Information Processing Systems, 37:89373–89407, 2024
work page 2024
-
[41]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Data is what data does: Regulating based on harm and risk instead of sensitive data.Nw
Daniel J Solove. Data is what data does: Regulating based on harm and risk instead of sensitive data.Nw. UL Rev., 118:1081, 2023
work page 2023
-
[43]
Latanya Sweeney. k-anonymity: A model for protecting privacy.International journal of uncertainty,fuzziness and knowledge-based systems, 10(05):557–570, 2002
work page 2002
-
[44]
Unveiling privacy risks in llm agent memory
Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. Unveiling privacy risks in llm agent memory. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 25241–25260, 2025
work page 2025
-
[45]
Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, and Furu Wei. Augmenting language models with long-term memory.Advancesin Neural Information Processing Systems, 36:74530–74543, 2023
work page 2023
-
[46]
A-MEM: Agentic Memory for LLM Agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Private retrieval augmented generation with random projection
Dixi Yao and Tian Li. Private retrieval augmented generation with random projection. InICLR 2025 Workshop on Building Trustin Language Models and Applications, 2025
work page 2025
-
[48]
Machine unlearning of pre-trained large language models
Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, and Xiang Yue. Machine unlearning of pre-trained large language models. InProceedings of the 62nd annualmeeting of the association for computational linguistics (volume 1: Long papers), pages 8403–8419, 2024
work page 2024
-
[49]
Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data
Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, and Jiliang Tang. Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24538–24569, 2025
work page 2025
-
[50]
Prism: Privacy-aware routing for adaptive cloud–edge llm inference via semantic sketch collaboration
Junfei Zhan, Haoxun Shen, Zheng Lin, and Tengjiao He. Prism: Privacy-aware routing for adaptive cloud–edge llm inference via semantic sketch collaboration. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 28150–28158, 2026
work page 2026
-
[51]
Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, and Xiwei Xu. Right to be forgotten in the era of large language models: Implications, challenges, and solutions.AI and Ethics, 5(3):2445–2454, 2025
work page 2025
-
[52]
Understanding users’ privacy perceptions towards llm’s rag-based memory
Shuning Zhang, Rongjun Ma, Ying Ma, Shixuan Li, Yiqun Xu, Xin Yi, and Hewu Li. Understanding users’ privacy perceptions towards llm’s rag-based memory. InProceedings of the 2025 Workshop on Human-Centered AI Privacy and Security, HAIPS ’25, page 10–19, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400719059. doi: 10.1145/3733816.3...
-
[53]
Understanding users’ privacy perceptions towards llm’s rag-based memory
Shuning Zhang, Rongjun Ma, Ying Ma, Shixuan Li, Yiqun Xu, Xin Yi, and Hewu Li. Understanding users’ privacy perceptions towards llm’s rag-based memory. InProceedings of the 2025 Workshopon Human-Centered AI Privacy and Security, pages 10–19, 2025. 17
work page 2025
-
[54]
Inside out: Evolving user-centric core memory trees for long-term personalized dialogue systems
Jihao Zhao, Ding Chen, Zhaoxin Fan, Kerun Xu, Mengting Hu, Bo Tang, Feiyu Xiong, and Zhiyu Li. Inside out: Evolving user-centric core memory trees for long-term personalized dialogue systems. arXiv preprint arXiv:2601.05171, 2026
-
[55]
Swift: a scalable lightweight infrastructure for fine-tuning
Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, et al. Swift: a scalable lightweight infrastructure for fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 29733–29735, 2025
work page 2025
-
[56]
Cookies lack integrity:{Real-World} implications
Xiaofeng Zheng, Jian Jiang, Jinjin Liang, Haixin Duan, Shuo Chen, Tao Wan, and Nicholas Weaver. Cookies lack integrity:{Real-World} implications. In 24th USENIX Security Symposium (USENIX Security 15), pages 707–721, 2015
work page 2015
-
[57]
Llamafactory: Unified efficient fine- tuning of 100+ language models
Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, and Zheyan Luo. Llamafactory: Unified efficient fine- tuning of 100+ language models. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 3: system demonstrations), pages 400–410, 2024
work page 2024
-
[58]
Memorybank: Enhancing large language models with long-term memory
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 19724–19731, 2024. 18 Appendices A Supplementary Details of Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 ...
work page 2024
-
[59]
C ar efu ll y review all re tr ie ve d memories from both USER and A SS IS TA NT
-
[60]
Use only the i n f o r m a t i o n e x p l i c i t l y stated in the memories
-
[61]
Pay at te nt io n to t i m e s t a m p s to d et erm in e when events happened
-
[62]
If multiple memories conflict , p r i o r i t i z e the most recent one
-
[63]
If a memory contains relative time r e f e r e n c e s ( e . g . , " last year " , " two months ago ") : - Convert them into an exact date , month , or year using the memory ti me st am p . - Example : If a memory dated **4 May 2022** says " went to India last year " , the event happened in ** 20 21 **
work page 2022
-
[64]
Always convert relative time e x p r e s s i o n s into specific dates or years in your re as on in g
-
[65]
Do not assume facts that are not present in the memories
-
[66]
Do not confuse people m ent io ne d inside memories with the speakers t h e m s e l v e s
Treat USER and A SS IST AN T only as speakers in the c o n v e r s a t i o n . Do not confuse people m ent io ne d inside memories with the speakers t h e m s e l v e s
-
[67]
# RE AS ONI NG PROCESS Think step by step :
The final answer must be ** short **. # RE AS ONI NG PROCESS Think step by step :
-
[68]
Identify memories relevant to the question
-
[69]
Examine their t i m e s t a m p s and content
-
[70]
Extract explicit facts about dates , locations , or events
-
[71]
Convert relative time r e f e r e n c e s to exact dates if ne ce ss ar y
-
[72]
Select the most reliable evidence ( prefer newer memories if c on fl ic ts exist )
-
[73]
Produce a concise answer that directly answers the question . # MEMORY DATA Memories from USER ({ us er _n ame }) : { u s e r _ m e m o r i e s } # QUESTION { question } # ANSWER Figure 4Prompt template for answering MemPrivacy-Bench short-answer questions. Table 6Illustrative examples of detailed privacy policy provisions. Primary Governance Principle Se...
-
[74]
** C o n t r a d i c t o r y or f a b r i c a t e d i n f o r m a t i o n always results in ‘ incorrect ‘** , even if some parts are correct
-
[75]
If the response ** contains only a subset of the Re fe re nc e Answer but remains fully c o n s i s t e n t ** , classify it as ** ‘ partially_correct ‘**
-
[76]
unknown / cannot be d e t e r m i n e d
A response is ** ‘ correct ‘ only if it fully captures the meaning of the R ef er en ce Answer **. Figure 6Prompt template for GPT-5.2-Based grading of short-answer responses (1/2). 29 ## 3. Detailed G u i d e l i n e s and T o l e r a n c e s * ** E q u i v a l e n t e x p r e s s i o n s ** of numbers , time , or units are acceptable , but the ** nu me ...
-
[77]
A ‘ User ’ s Real Name ‘ field : the user ’ s verified real full name , provided to assist you in d e t e r m i n i n g whether a name ap pe ar ing in the dialogue is the user ’ s own name or a third party ’ s name
-
[78]
A ‘ Current Input Dialogue ‘ field : the dialogue content between the user and the AI that you must analyze , from which you should identify and extract all fr ag me nt s c o n t a i n i n g PL2 , PL3 , or PL4 privacy i n f o r m a t i o n a cc or di ng to the st an da rd s defined below . # Privacy Level St an da rd s & C l a s s i f i c a t i o n Rules ...
-
[79]
Auth / Account : Passwords , PINs , Security Qu es ti on s & Answers , V e r i f i c a t i o n Codes ( SMS / Email / MFA ) , Session Tokens , Cookies ( c o n t a i n i n g auth ) , OAuth Codes , Bank / Payment Card Security Codes ( CVC , CVV , etc .) , Backup Codes , Recovery Codes , SSO Tickets
-
[80]
Keys / S i g n a t u r e s : API Keys , AccessKeys , Secret Keys , Private Keys , Mnemonics , Seed Phrases , Database C o n n e c t i o n Strings ( c o n t a i n i n g c r e d e n t i a l s ) , C e r t i f i c a t e Private Keys , Signing Keys , E n c r y p t i o n Keys , etc
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.