MemLeak: Diagnosing Information Leaks in Multimodal Agent Memory
Pith reviewed 2026-06-30 07:23 UTC · model grok-4.3
The pith
Retained images let multimodal AI agents recover 12% of facts after text deletion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When a multimodal AI agent is asked to forget a fact, current memory systems delete the text entry and report success, yet the fact remains recoverable from retained user images because VLMs use implicit visual cues at inference time. The Information Provenance Graph taxonomy reveals that deletion fails through multiple channels; the MemLeak benchmark quantifies this across a deletion cascade, finding that retained correlated text enables 18.3% recovery and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR), with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%.
What carries the argument
The Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance and identifies the multiple channels through which deletion fails.
If this is right
- Direct probing of deletion-capable systems yields less than 1% recovery.
- Retained correlated text enables 18.3% recovery of deleted facts.
- Retained images enable 12.0% recovery, and 47% of those image leaks cannot be recovered from text alone.
- Content-aware semantic deletion lowers the image residual to 2.0%.
- The residual persists across multiple VLMs, a production memory system, and real photographs.
Where Pith is reading between the lines
- Memory systems for multimodal agents will need deletion routines that operate on both text and image representations simultaneously.
- Privacy guarantees in deployed agents that store user photos may be weaker than text-only deletion policies suggest.
- Benchmarking deletion should include cross-modal leakage tests rather than text-only audits.
- Future agent designs could tag images with explicit provenance links so that semantic deletion can target visual content directly.
Load-bearing premise
The measured recovery rates come from implicit visual cues that VLMs actually use at inference time rather than from artifacts of the memory system setup or the image tagging process.
What would settle it
Re-running the MemLeak benchmark on a VLM that has been explicitly trained or prompted to ignore implicit visual cues in retained images and finding zero recovery above the 0.3% FPR baseline would falsify the central claim.
Figures
read the original abstract
When a multimodal AI agent is asked to forget a fact, current memory systems usually delete the text entry and report success. We find that the fact can remain recoverable from retained user images, including images tagged to entirely different facts, because VLMs use implicit visual cues at inference time. We introduce the Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance. The IPG reveals that deletion fails through multiple channels. Our benchmark, MemLeak, measures this across a deletion cascade: direct probing of deletion-capable systems yields <1%, but retained correlated text enables 18.3% recovery, and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR) -- with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%. The residual appears across multiple VLMs, a production memory system, and real Unsplash-licensed photographs. Dual-annotator human validation (kappa = 0.88) confirms judge reliability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Information Provenance Graph (IPG) taxonomy classifying memory representations by deletion affordance and the MemLeak benchmark to measure information leaks in multimodal agent memory. It reports that retained images enable 12.0% recovery of deleted facts (0.0% blind baseline, 0.3% FPR), with 47% of image leaks not text-recoverable, across a deletion cascade; content-aware semantic deletion reduces the residual to 2.0%. The residual is observed across multiple VLMs, a production memory system, and real Unsplash images, with dual-annotator validation (kappa=0.88).
Significance. If the results hold after addressing isolation concerns, the work identifies a concrete privacy vulnerability in multimodal agents that current text-only deletion mechanisms do not address, with direct implications for agent design. Strengths include explicit baselines and FPR reporting, cross-VLM and real-image evaluation, and reproducible human validation protocol.
major comments (2)
- [Abstract] Abstract: The central claim that 'VLMs use implicit visual cues at inference time' to recover facts from retained images is not isolated from potential artifacts of the IPG taxonomy implementation, image-tagging pipeline, or deletion-cascade retrieval logic. No ablation is described that holds tagging and retrieval fixed while varying only visual content, which is required to support the causal attribution for the 12.0% recovery rate.
- [Abstract] Abstract / benchmark description: The reported recovery rates (12.0% image, 18.3% correlated text, 2.0% post-semantic deletion) and human validation (kappa=0.88) are presented without sufficient detail on dataset construction, image selection criteria, or how facts are mapped to images in the IPG, preventing assessment of whether post-hoc choices affect the measurements.
minor comments (1)
- [Abstract] The abstract introduces 'content-aware semantic deletion' without a concise definition or reference to its implementation details in the main text.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on the isolation of visual effects and the need for additional methodological detail. We address both points below and will revise the manuscript accordingly to strengthen the causal claims and reproducibility.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'VLMs use implicit visual cues at inference time' to recover facts from retained images is not isolated from potential artifacts of the IPG taxonomy implementation, image-tagging pipeline, or deletion-cascade retrieval logic. No ablation is described that holds tagging and retrieval fixed while varying only visual content, which is required to support the causal attribution for the 12.0% recovery rate.
Authors: We agree that an explicit ablation holding the IPG taxonomy, tagging pipeline, and retrieval logic fixed while varying only visual content would provide stronger causal evidence for attributing the 12.0% recovery to implicit visual cues. The current design compares against a blind baseline (0.0%) and reports that 47% of image leaks are not text-recoverable, but these do not fully isolate visual content from pipeline artifacts. We will add this ablation in the revised manuscript (Section 4) by substituting neutral or text-only images under fixed tagging/retrieval conditions. revision: yes
-
Referee: [Abstract] Abstract / benchmark description: The reported recovery rates (12.0% image, 18.3% correlated text, 2.0% post-semantic deletion) and human validation (kappa=0.88) are presented without sufficient detail on dataset construction, image selection criteria, or how facts are mapped to images in the IPG, preventing assessment of whether post-hoc choices affect the measurements.
Authors: We will expand the methods and benchmark sections with explicit details on dataset construction, including Unsplash image selection criteria (e.g., licensing, diversity filters), the fact-to-image mapping process in the IPG (one-to-many relations, tagging rules), and summary statistics (e.g., images per fact, total facts). These will appear in the main text and a new appendix to allow assessment of post-hoc choices. revision: yes
Circularity Check
No circularity; empirical benchmark with explicit baselines and no derivations
full rationale
The paper reports empirical measurements of information recovery rates in a multimodal memory benchmark (MemLeak) against stated baselines (0.0% blind, 0.3% FPR) and controls (semantic deletion, dual-annotator validation with kappa=0.88). No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs or self-definitions. The IPG is introduced as a descriptive taxonomy classifying deletion affordances, not derived from prior results. Central claims rest on direct experimental outcomes across VLMs and real images rather than any self-referential chain or renamed known result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption VLMs use implicit visual cues at inference time to recover facts from images even when text is deleted
Reference graph
Works this paper leans on
-
[1]
Locating and Editing Factual Associations in
Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle=. Locating and Editing Factual Associations in
-
[2]
ICLR , year=
Mass-Editing Memory in a Transformer , author=. ICLR , year=
-
[3]
2025 , url=
Mem0: Memory Layer for. 2025 , url=
2025
-
[4]
2025 , journal=
Zep: A Temporal Knowledge Graph Architecture for Agent Memory , author=. 2025 , journal=
2025
-
[5]
Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , year=
-
[6]
TMLR , year=
Cognitive Architectures for Language Agents , author=. TMLR , year=
-
[7]
Interpreting
Gandelsman, Yossi and Efros, Alexei A and Steinhardt, Jacob , booktitle=. Interpreting
-
[8]
ICML , year=
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models , author=. ICML , year=
-
[9]
PoPETS , year=
Mission: Impossible---Image-Based Geolocation with Large Vision Language Models , author=. PoPETS , year=
-
[10]
Securing
Costa, Manuel and Kopf, Boris and others , year=. Securing
-
[11]
Hu, Yuanzhe and Wang, Yu and McAuley, Julian , booktitle=
-
[12]
AAAI , year=
An Information Theoretic Evaluation Metric for Strong Unlearning , author=. AAAI , year=
-
[13]
Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary C and Kolter, J Zico , booktitle=
-
[14]
Shi, Weijia and Lee, Jaechan and Huang, Yangsibo and others , booktitle=
-
[15]
Wang, Chengye and Li, Yuyuan and others , booktitle=
-
[16]
Liu, Zheyuan and others , booktitle=
-
[17]
Dontsov, Alexey and others , booktitle=
-
[18]
Unlearning Sensitive Information in Multimodal
Patil, Vaidehi and Sung, Yi-Lin and others , year=. Unlearning Sensitive Information in Multimodal
-
[19]
NeurIPS , year=
Private Attribute Inference from Images with Vision-Language Models , author=. NeurIPS , year=
-
[20]
Collaborative Memory: Multi-User Memory Sharing in
Rezazadeh, Ali and others , booktitle=. Collaborative Memory: Multi-User Memory Sharing in
-
[21]
Zhang, Ziyang and others , year=
-
[22]
AAAI , year=
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models , author=. AAAI , year=
-
[23]
Bei, Yuanchen and others , year=
-
[24]
2025 , journal=
Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents , author=. 2025 , journal=
2025
-
[25]
Position:
Thaker, Pratiksha and Hu, Shengyuan and others , booktitle=. Position:
-
[26]
Agentic Unlearning: When
Wang, Bin and Wang, Fan and Wang, Pingping and Cong, Jinyu and Yu, Yang and Yin, Yilong and Han, Zhongyi and Wei, Benzheng , year=. Agentic Unlearning: When
-
[27]
IEEE Symposium on Security and Privacy (SP) , year=
Machine unlearning , author=. IEEE Symposium on Security and Privacy (SP) , year=
-
[28]
Selvas-Sala, Cai and Kang, Lei and Gomez, Lluis , booktitle=
-
[29]
Li, Zhangheng and Hong, Junyuan and Zhu, Jianing and Eum, Sungmin and Hu, Shuowen and You, Suya and Wang, Zhangyang , booktitle=
-
[30]
ICLR , year=
Machine Unlearning under Retain-Forget Entanglement , author=. ICLR , year=
-
[31]
Ghost in the Agent: Redefining Information Flow Tracking for
Cai, Yuandao and Tang, Wensheng and Wen, Cheng and Qin, Shengchao , year=. Ghost in the Agent: Redefining Information Flow Tracking for
-
[32]
Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with
Tsaprazlis, Efthymios and Feng, Tiantian and Ramakrishna, Anil and Karimireddy, Sai Praneeth and Gupta, Rahul and Narayanan, Shrikanth , year=. Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with
-
[33]
Graph-Native Cognitive Memory for
Park, Young Bin , year=. Graph-Native Cognitive Memory for
-
[34]
Bodea, Andreea-Elena and Meisenbacher, Stephen and Klymenko, Alexandra and Matthes, Florian , booktitle=
-
[35]
Saad Alqithami. Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents. arXiv:2512.12856, 2025
-
[36]
Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents
Yuanchen Bei et al. Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents. arXiv, 2026
2026
-
[37]
SoK : Privacy risks and mitigations in retrieval-augmented generation systems
Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. SoK : Privacy risks and mitigations in retrieval-augmented generation systems. In IEEE SaTML, 2026
2026
-
[38]
Machine unlearning
Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In IEEE Symposium on Security and Privacy (SP), 2021
2021
-
[39]
Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents
Yuandao Cai, Wensheng Tang, Cheng Wen, and Shengchao Qin. Ghost in the agent: Redefining information flow tracking for LLM agents. arXiv:2604.23374, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[40]
Machine unlearning under retain-forget entanglement
Jingpu Cheng, Ping Liu, Qianxiao Li, and Chi Zhang. Machine unlearning under retain-forget entanglement. In ICLR, 2026
2026
-
[41]
Securing AI Agents with Information-Flow Control
Manuel Costa, Boris Kopf, et al. Securing AI agents with information-flow control. arXiv:2505.23643, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
CLEAR : Character unlearning in textual and visual modalities
Alexey Dontsov et al. CLEAR : Character unlearning in textual and visual modalities. In ACL Findings, 2025
2025
-
[43]
Interpreting CLIP 's image representation via text-based decomposition
Yossi Gandelsman, Alexei A Efros, and Jacob Steinhardt. Interpreting CLIP 's image representation via text-based decomposition. In ICLR, 2024
2024
-
[44]
MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions
Yuanzhe Hu, Yu Wang, and Julian McAuley. MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions. In ICLR, 2026
2026
-
[45]
An information theoretic evaluation metric for strong unlearning
Sangwon Jeon, Dongkeun Jeung, Wonjun Kim, Albert No, and Jungseul Choi. An information theoretic evaluation metric for strong unlearning. In AAAI, 2026
2026
-
[46]
Watch out your album! on the inadvertent privacy memorization in multi-modal large language models
Haoran Ju, Wenjie Hua, Hao Fei, Zecheng Shao, Yuren Zheng, Zhuotong Zhao, Roy Ka-Wei Lee, Bo-Hsun Hsu, Ruiqi Zhang, and Bryan Hooi Liu. Watch out your album! on the inadvertent privacy memorization in multi-modal large language models. In ICML, 2025
2025
-
[47]
Cross-modal unlearning via influential neuron path editing in multimodal large language models
Kunhao Li, Wenhao Li, et al. Cross-modal unlearning via influential neuron path editing in multimodal large language models. In AAAI, 2026 a
2026
-
[48]
POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks
Zhangheng Li, Junyuan Hong, Jianing Zhu, Sungmin Eum, Shuowen Hu, Suya You, and Zhangyang Wang. POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks. In ICLR, 2026 b
2026
-
[49]
Mission: Impossible---image-based geolocation with large vision language models
Jian Liu, Weidi Deng, et al. Mission: Impossible---image-based geolocation with large vision language models. In PoPETS, 2025 a
2025
-
[50]
MLLMU-Bench : Protecting privacy in multimodal large language models
Zheyuan Liu et al. MLLMU-Bench : Protecting privacy in multimodal large language models. In NAACL, 2025 b
2025
-
[51]
TOFU : A task of fictitious unlearning for LLMs
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. TOFU : A task of fictitious unlearning for LLMs . In COLM, 2024
2024
-
[52]
Mem0: Memory layer for AI agents, 2025
Mem0 AI . Mem0: Memory layer for AI agents, 2025. URL https://github.com/mem0ai/mem0
2025
-
[53]
Memos: An operating system for memory-augmented generation (mag) in large language models, 2025
MemTensor . MemOS : An operating system for memory-augmented generation. arXiv:2505.22101, 2025
-
[54]
Locating and editing factual associations in GPT
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT . In NeurIPS, 2022
2022
-
[55]
Mass-editing memory in a transformer
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. In ICLR, 2023
2023
-
[56]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G Patil, Ion Stoica, and Joseph E Gonzalez. MemGPT : Towards LLMs as operating systems, 2024. arXiv:2310.08560
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[57]
Young Bin Park. Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures. arXiv:2603.17244, 2026
-
[58]
Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation
Vaidehi Patil, Yi-Lin Sung, et al. Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation. TMLR, 2024
2024
-
[59]
Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control
Ali Rezazadeh et al. Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control. In ICML, 2025
2025
-
[60]
SALMUBench : A benchmark for sensitive association-level multimodal unlearning
Cai Selvas-Sala, Lei Kang, and Lluis Gomez. SALMUBench : A benchmark for sensitive association-level multimodal unlearning. In CVPR, 2026
2026
-
[61]
MUSE : Machine unlearning six-way evaluation for language models
Weijia Shi, Jaechan Lee, Yangsibo Huang, et al. MUSE : Machine unlearning six-way evaluation for language models. In ICLR, 2025
2025
-
[62]
Cognitive architectures for language agents
Theodore R Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. TMLR, 2024
2024
-
[63]
Position: LLM unlearning benchmarks are weak measures of progress
Pratiksha Thaker, Shengyuan Hu, et al. Position: LLM unlearning benchmarks are weak measures of progress. In SaTML, 2025
2025
-
[64]
Private attribute inference from images with vision-language models
Arda Toemekce et al. Private attribute inference from images with vision-language models. In NeurIPS, 2024
2024
-
[65]
Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs
Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Sai Praneeth Karimireddy, Rahul Gupta, and Shrikanth Narayanan. Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs . arXiv:2603.21573, 2026
-
[66]
Agentic unlearning: When LLM agent meets machine unlearning
Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, and Benzheng Wei. Agentic unlearning: When LLM agent meets machine unlearning. arXiv:2602.17692, 2026
-
[67]
UMU-Bench : Closing the modality gap in multimodal unlearning evaluation
Chengye Wang, Yuyuan Li, et al. UMU-Bench : Closing the modality gap in multimodal unlearning evaluation. In NeurIPS Datasets & Benchmarks, 2025
2025
-
[68]
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Zep AI . Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[69]
MMDU-Bench : Multi-modal deep unlearning benchmark
Ziyang Zhang et al. MMDU-Bench : Multi-modal deep unlearning benchmark. Under review, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.