pith. sign in

arxiv: 2606.29788 · v1 · pith:SLIDP7KMnew · submitted 2026-06-29 · 💻 cs.LG

MemLeak: Diagnosing Information Leaks in Multimodal Agent Memory

Pith reviewed 2026-06-30 07:23 UTC · model grok-4.3

classification 💻 cs.LG
keywords multimodal agentsmemory deletioninformation leakagevision-language modelsMemLeak benchmarkInformation Provenance Graphsemantic deletion
0
0 comments X

The pith

Retained images let multimodal AI agents recover 12% of facts after text deletion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that telling a multimodal agent to forget a fact by deleting its text entry leaves the fact recoverable from user-supplied images that remain in memory. These images leak information because vision-language models extract implicit visual cues at inference time, even when the images are tagged to unrelated facts. The authors build a taxonomy called the Information Provenance Graph to classify which memory elements can actually be deleted, then release the MemLeak benchmark that measures leakage through direct probing, correlated text, and retained images. Content-aware semantic deletion cuts the image-based residual from 12.0% to 2.0%. The effect appears across multiple VLMs, a production memory system, and real photographs, with human validation confirming the judge.

Core claim

When a multimodal AI agent is asked to forget a fact, current memory systems delete the text entry and report success, yet the fact remains recoverable from retained user images because VLMs use implicit visual cues at inference time. The Information Provenance Graph taxonomy reveals that deletion fails through multiple channels; the MemLeak benchmark quantifies this across a deletion cascade, finding that retained correlated text enables 18.3% recovery and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR), with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%.

What carries the argument

The Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance and identifies the multiple channels through which deletion fails.

If this is right

  • Direct probing of deletion-capable systems yields less than 1% recovery.
  • Retained correlated text enables 18.3% recovery of deleted facts.
  • Retained images enable 12.0% recovery, and 47% of those image leaks cannot be recovered from text alone.
  • Content-aware semantic deletion lowers the image residual to 2.0%.
  • The residual persists across multiple VLMs, a production memory system, and real photographs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Memory systems for multimodal agents will need deletion routines that operate on both text and image representations simultaneously.
  • Privacy guarantees in deployed agents that store user photos may be weaker than text-only deletion policies suggest.
  • Benchmarking deletion should include cross-modal leakage tests rather than text-only audits.
  • Future agent designs could tag images with explicit provenance links so that semantic deletion can target visual content directly.

Load-bearing premise

The measured recovery rates come from implicit visual cues that VLMs actually use at inference time rather than from artifacts of the memory system setup or the image tagging process.

What would settle it

Re-running the MemLeak benchmark on a VLM that has been explicitly trained or prompted to ignore implicit visual cues in retained images and finding zero recovery above the 0.3% FPR baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.29788 by Chao Zhang, Kuan Wang.

Figure 1
Figure 1. Figure 1: Multi-channel leakage after fact-level deletion. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The IPG taxonomy applied to a measured example. Left: storage-level deletion removes the target fact’s addressable and linked nodes (dashed boundary marks deletion scope); persistent nodes lie outside this scope. Right: a retained fact preserves all nodes; the persistent “implicit visual features” node (red, bold) encodes cross-fact cues that provenance-based deletion cannot reach. Bottom: a VLM reconstruc… view at source ↗
Figure 3
Figure 3. Figure 3: Multi-channel leakage decomposition. Retained text (18.3%) and images (12.0%) both [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

When a multimodal AI agent is asked to forget a fact, current memory systems usually delete the text entry and report success. We find that the fact can remain recoverable from retained user images, including images tagged to entirely different facts, because VLMs use implicit visual cues at inference time. We introduce the Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance. The IPG reveals that deletion fails through multiple channels. Our benchmark, MemLeak, measures this across a deletion cascade: direct probing of deletion-capable systems yields <1%, but retained correlated text enables 18.3% recovery, and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR) -- with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%. The residual appears across multiple VLMs, a production memory system, and real Unsplash-licensed photographs. Dual-annotator human validation (kappa = 0.88) confirms judge reliability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Information Provenance Graph (IPG) taxonomy classifying memory representations by deletion affordance and the MemLeak benchmark to measure information leaks in multimodal agent memory. It reports that retained images enable 12.0% recovery of deleted facts (0.0% blind baseline, 0.3% FPR), with 47% of image leaks not text-recoverable, across a deletion cascade; content-aware semantic deletion reduces the residual to 2.0%. The residual is observed across multiple VLMs, a production memory system, and real Unsplash images, with dual-annotator validation (kappa=0.88).

Significance. If the results hold after addressing isolation concerns, the work identifies a concrete privacy vulnerability in multimodal agents that current text-only deletion mechanisms do not address, with direct implications for agent design. Strengths include explicit baselines and FPR reporting, cross-VLM and real-image evaluation, and reproducible human validation protocol.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'VLMs use implicit visual cues at inference time' to recover facts from retained images is not isolated from potential artifacts of the IPG taxonomy implementation, image-tagging pipeline, or deletion-cascade retrieval logic. No ablation is described that holds tagging and retrieval fixed while varying only visual content, which is required to support the causal attribution for the 12.0% recovery rate.
  2. [Abstract] Abstract / benchmark description: The reported recovery rates (12.0% image, 18.3% correlated text, 2.0% post-semantic deletion) and human validation (kappa=0.88) are presented without sufficient detail on dataset construction, image selection criteria, or how facts are mapped to images in the IPG, preventing assessment of whether post-hoc choices affect the measurements.
minor comments (1)
  1. [Abstract] The abstract introduces 'content-aware semantic deletion' without a concise definition or reference to its implementation details in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the isolation of visual effects and the need for additional methodological detail. We address both points below and will revise the manuscript accordingly to strengthen the causal claims and reproducibility.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'VLMs use implicit visual cues at inference time' to recover facts from retained images is not isolated from potential artifacts of the IPG taxonomy implementation, image-tagging pipeline, or deletion-cascade retrieval logic. No ablation is described that holds tagging and retrieval fixed while varying only visual content, which is required to support the causal attribution for the 12.0% recovery rate.

    Authors: We agree that an explicit ablation holding the IPG taxonomy, tagging pipeline, and retrieval logic fixed while varying only visual content would provide stronger causal evidence for attributing the 12.0% recovery to implicit visual cues. The current design compares against a blind baseline (0.0%) and reports that 47% of image leaks are not text-recoverable, but these do not fully isolate visual content from pipeline artifacts. We will add this ablation in the revised manuscript (Section 4) by substituting neutral or text-only images under fixed tagging/retrieval conditions. revision: yes

  2. Referee: [Abstract] Abstract / benchmark description: The reported recovery rates (12.0% image, 18.3% correlated text, 2.0% post-semantic deletion) and human validation (kappa=0.88) are presented without sufficient detail on dataset construction, image selection criteria, or how facts are mapped to images in the IPG, preventing assessment of whether post-hoc choices affect the measurements.

    Authors: We will expand the methods and benchmark sections with explicit details on dataset construction, including Unsplash image selection criteria (e.g., licensing, diversity filters), the fact-to-image mapping process in the IPG (one-to-many relations, tagging rules), and summary statistics (e.g., images per fact, total facts). These will appear in the main text and a new appendix to allow assessment of post-hoc choices. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmark with explicit baselines and no derivations

full rationale

The paper reports empirical measurements of information recovery rates in a multimodal memory benchmark (MemLeak) against stated baselines (0.0% blind, 0.3% FPR) and controls (semantic deletion, dual-annotator validation with kappa=0.88). No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs or self-definitions. The IPG is introduced as a descriptive taxonomy classifying deletion affordances, not derived from prior results. Central claims rest on direct experimental outcomes across VLMs and real images rather than any self-referential chain or renamed known result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the domain assumption that VLMs exploit implicit visual cues from retained images; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption VLMs use implicit visual cues at inference time to recover facts from images even when text is deleted
    Directly invoked in abstract as the mechanism enabling image-based leaks.

pith-pipeline@v0.9.1-grok · 5717 in / 1257 out tokens · 39753 ms · 2026-06-30T07:23:42.123781+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 9 canonical work pages · 4 internal anchors

  1. [1]

    Locating and Editing Factual Associations in

    Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle=. Locating and Editing Factual Associations in

  2. [2]

    ICLR , year=

    Mass-Editing Memory in a Transformer , author=. ICLR , year=

  3. [3]

    2025 , url=

    Mem0: Memory Layer for. 2025 , url=

  4. [4]

    2025 , journal=

    Zep: A Temporal Knowledge Graph Architecture for Agent Memory , author=. 2025 , journal=

  5. [5]

    Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , year=

  6. [6]

    TMLR , year=

    Cognitive Architectures for Language Agents , author=. TMLR , year=

  7. [7]

    Interpreting

    Gandelsman, Yossi and Efros, Alexei A and Steinhardt, Jacob , booktitle=. Interpreting

  8. [8]

    ICML , year=

    Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models , author=. ICML , year=

  9. [9]

    PoPETS , year=

    Mission: Impossible---Image-Based Geolocation with Large Vision Language Models , author=. PoPETS , year=

  10. [10]

    Securing

    Costa, Manuel and Kopf, Boris and others , year=. Securing

  11. [11]

    Hu, Yuanzhe and Wang, Yu and McAuley, Julian , booktitle=

  12. [12]

    AAAI , year=

    An Information Theoretic Evaluation Metric for Strong Unlearning , author=. AAAI , year=

  13. [13]

    Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary C and Kolter, J Zico , booktitle=

  14. [14]

    Shi, Weijia and Lee, Jaechan and Huang, Yangsibo and others , booktitle=

  15. [15]

    Wang, Chengye and Li, Yuyuan and others , booktitle=

  16. [16]

    Liu, Zheyuan and others , booktitle=

  17. [17]

    Dontsov, Alexey and others , booktitle=

  18. [18]

    Unlearning Sensitive Information in Multimodal

    Patil, Vaidehi and Sung, Yi-Lin and others , year=. Unlearning Sensitive Information in Multimodal

  19. [19]

    NeurIPS , year=

    Private Attribute Inference from Images with Vision-Language Models , author=. NeurIPS , year=

  20. [20]

    Collaborative Memory: Multi-User Memory Sharing in

    Rezazadeh, Ali and others , booktitle=. Collaborative Memory: Multi-User Memory Sharing in

  21. [21]

    Zhang, Ziyang and others , year=

  22. [22]

    AAAI , year=

    Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models , author=. AAAI , year=

  23. [23]

    Bei, Yuanchen and others , year=

  24. [24]

    2025 , journal=

    Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents , author=. 2025 , journal=

  25. [25]

    Position:

    Thaker, Pratiksha and Hu, Shengyuan and others , booktitle=. Position:

  26. [26]

    Agentic Unlearning: When

    Wang, Bin and Wang, Fan and Wang, Pingping and Cong, Jinyu and Yu, Yang and Yin, Yilong and Han, Zhongyi and Wei, Benzheng , year=. Agentic Unlearning: When

  27. [27]

    IEEE Symposium on Security and Privacy (SP) , year=

    Machine unlearning , author=. IEEE Symposium on Security and Privacy (SP) , year=

  28. [28]

    Selvas-Sala, Cai and Kang, Lei and Gomez, Lluis , booktitle=

  29. [29]

    Li, Zhangheng and Hong, Junyuan and Zhu, Jianing and Eum, Sungmin and Hu, Shuowen and You, Suya and Wang, Zhangyang , booktitle=

  30. [30]

    ICLR , year=

    Machine Unlearning under Retain-Forget Entanglement , author=. ICLR , year=

  31. [31]

    Ghost in the Agent: Redefining Information Flow Tracking for

    Cai, Yuandao and Tang, Wensheng and Wen, Cheng and Qin, Shengchao , year=. Ghost in the Agent: Redefining Information Flow Tracking for

  32. [32]

    Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with

    Tsaprazlis, Efthymios and Feng, Tiantian and Ramakrishna, Anil and Karimireddy, Sai Praneeth and Gupta, Rahul and Narayanan, Shrikanth , year=. Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with

  33. [33]

    Graph-Native Cognitive Memory for

    Park, Young Bin , year=. Graph-Native Cognitive Memory for

  34. [34]

    Bodea, Andreea-Elena and Meisenbacher, Stephen and Klymenko, Alexandra and Matthes, Florian , booktitle=

  35. [35]

    Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents

    Saad Alqithami. Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents. arXiv:2512.12856, 2025

  36. [36]

    Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents

    Yuanchen Bei et al. Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents. arXiv, 2026

  37. [37]

    SoK : Privacy risks and mitigations in retrieval-augmented generation systems

    Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. SoK : Privacy risks and mitigations in retrieval-augmented generation systems. In IEEE SaTML, 2026

  38. [38]

    Machine unlearning

    Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In IEEE Symposium on Security and Privacy (SP), 2021

  39. [39]

    Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

    Yuandao Cai, Wensheng Tang, Cheng Wen, and Shengchao Qin. Ghost in the agent: Redefining information flow tracking for LLM agents. arXiv:2604.23374, 2026

  40. [40]

    Machine unlearning under retain-forget entanglement

    Jingpu Cheng, Ping Liu, Qianxiao Li, and Chi Zhang. Machine unlearning under retain-forget entanglement. In ICLR, 2026

  41. [41]

    Securing AI Agents with Information-Flow Control

    Manuel Costa, Boris Kopf, et al. Securing AI agents with information-flow control. arXiv:2505.23643, 2025

  42. [42]

    CLEAR : Character unlearning in textual and visual modalities

    Alexey Dontsov et al. CLEAR : Character unlearning in textual and visual modalities. In ACL Findings, 2025

  43. [43]

    Interpreting CLIP 's image representation via text-based decomposition

    Yossi Gandelsman, Alexei A Efros, and Jacob Steinhardt. Interpreting CLIP 's image representation via text-based decomposition. In ICLR, 2024

  44. [44]

    MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions

    Yuanzhe Hu, Yu Wang, and Julian McAuley. MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions. In ICLR, 2026

  45. [45]

    An information theoretic evaluation metric for strong unlearning

    Sangwon Jeon, Dongkeun Jeung, Wonjun Kim, Albert No, and Jungseul Choi. An information theoretic evaluation metric for strong unlearning. In AAAI, 2026

  46. [46]

    Watch out your album! on the inadvertent privacy memorization in multi-modal large language models

    Haoran Ju, Wenjie Hua, Hao Fei, Zecheng Shao, Yuren Zheng, Zhuotong Zhao, Roy Ka-Wei Lee, Bo-Hsun Hsu, Ruiqi Zhang, and Bryan Hooi Liu. Watch out your album! on the inadvertent privacy memorization in multi-modal large language models. In ICML, 2025

  47. [47]

    Cross-modal unlearning via influential neuron path editing in multimodal large language models

    Kunhao Li, Wenhao Li, et al. Cross-modal unlearning via influential neuron path editing in multimodal large language models. In AAAI, 2026 a

  48. [48]

    POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks

    Zhangheng Li, Junyuan Hong, Jianing Zhu, Sungmin Eum, Shuowen Hu, Suya You, and Zhangyang Wang. POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks. In ICLR, 2026 b

  49. [49]

    Mission: Impossible---image-based geolocation with large vision language models

    Jian Liu, Weidi Deng, et al. Mission: Impossible---image-based geolocation with large vision language models. In PoPETS, 2025 a

  50. [50]

    MLLMU-Bench : Protecting privacy in multimodal large language models

    Zheyuan Liu et al. MLLMU-Bench : Protecting privacy in multimodal large language models. In NAACL, 2025 b

  51. [51]

    TOFU : A task of fictitious unlearning for LLMs

    Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. TOFU : A task of fictitious unlearning for LLMs . In COLM, 2024

  52. [52]

    Mem0: Memory layer for AI agents, 2025

    Mem0 AI . Mem0: Memory layer for AI agents, 2025. URL https://github.com/mem0ai/mem0

  53. [53]

    MemOS : An operating system for memory-augmented generation

    MemTensor . MemOS : An operating system for memory-augmented generation. arXiv:2505.22101, 2025

  54. [54]

    Locating and editing factual associations in GPT

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT . In NeurIPS, 2022

  55. [55]

    Mass-editing memory in a transformer

    Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. In ICLR, 2023

  56. [56]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G Patil, Ion Stoica, and Joseph E Gonzalez. MemGPT : Towards LLMs as operating systems, 2024. arXiv:2310.08560

  57. [57]

    Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures

    Young Bin Park. Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures. arXiv:2603.17244, 2026

  58. [58]

    Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation

    Vaidehi Patil, Yi-Lin Sung, et al. Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation. TMLR, 2024

  59. [59]

    Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control

    Ali Rezazadeh et al. Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control. In ICML, 2025

  60. [60]

    SALMUBench : A benchmark for sensitive association-level multimodal unlearning

    Cai Selvas-Sala, Lei Kang, and Lluis Gomez. SALMUBench : A benchmark for sensitive association-level multimodal unlearning. In CVPR, 2026

  61. [61]

    MUSE : Machine unlearning six-way evaluation for language models

    Weijia Shi, Jaechan Lee, Yangsibo Huang, et al. MUSE : Machine unlearning six-way evaluation for language models. In ICLR, 2025

  62. [62]

    Cognitive architectures for language agents

    Theodore R Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. TMLR, 2024

  63. [63]

    Position: LLM unlearning benchmarks are weak measures of progress

    Pratiksha Thaker, Shengyuan Hu, et al. Position: LLM unlearning benchmarks are weak measures of progress. In SaTML, 2025

  64. [64]

    Private attribute inference from images with vision-language models

    Arda Toemekce et al. Private attribute inference from images with vision-language models. In NeurIPS, 2024

  65. [65]

    Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs

    Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Sai Praneeth Karimireddy, Rahul Gupta, and Shrikanth Narayanan. Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs . arXiv:2603.21573, 2026

  66. [66]

    Agentic unlearning: When LLM agent meets machine unlearning

    Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, and Benzheng Wei. Agentic unlearning: When LLM agent meets machine unlearning. arXiv:2602.17692, 2026

  67. [67]

    UMU-Bench : Closing the modality gap in multimodal unlearning evaluation

    Chengye Wang, Yuyuan Li, et al. UMU-Bench : Closing the modality gap in multimodal unlearning evaluation. In NeurIPS Datasets & Benchmarks, 2025

  68. [68]

    Zep: A Temporal Knowledge Graph Architecture for Agent Memory

    Zep AI . Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025

  69. [69]

    MMDU-Bench : Multi-modal deep unlearning benchmark

    Ziyang Zhang et al. MMDU-Bench : Multi-modal deep unlearning benchmark. Under review, 2025