MemLeak: Diagnosing Information Leaks in Multimodal Agent Memory

Chao Zhang; Kuan Wang

arxiv: 2606.29788 · v1 · pith:SLIDP7KMnew · submitted 2026-06-29 · 💻 cs.LG

MemLeak: Diagnosing Information Leaks in Multimodal Agent Memory

Kuan Wang , Chao Zhang This is my paper

Pith reviewed 2026-06-30 07:23 UTC · model grok-4.3

classification 💻 cs.LG

keywords multimodal agentsmemory deletioninformation leakagevision-language modelsMemLeak benchmarkInformation Provenance Graphsemantic deletion

0 comments

The pith

Retained images let multimodal AI agents recover 12% of facts after text deletion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that telling a multimodal agent to forget a fact by deleting its text entry leaves the fact recoverable from user-supplied images that remain in memory. These images leak information because vision-language models extract implicit visual cues at inference time, even when the images are tagged to unrelated facts. The authors build a taxonomy called the Information Provenance Graph to classify which memory elements can actually be deleted, then release the MemLeak benchmark that measures leakage through direct probing, correlated text, and retained images. Content-aware semantic deletion cuts the image-based residual from 12.0% to 2.0%. The effect appears across multiple VLMs, a production memory system, and real photographs, with human validation confirming the judge.

Core claim

When a multimodal AI agent is asked to forget a fact, current memory systems delete the text entry and report success, yet the fact remains recoverable from retained user images because VLMs use implicit visual cues at inference time. The Information Provenance Graph taxonomy reveals that deletion fails through multiple channels; the MemLeak benchmark quantifies this across a deletion cascade, finding that retained correlated text enables 18.3% recovery and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR), with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%.

What carries the argument

The Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance and identifies the multiple channels through which deletion fails.

If this is right

Direct probing of deletion-capable systems yields less than 1% recovery.
Retained correlated text enables 18.3% recovery of deleted facts.
Retained images enable 12.0% recovery, and 47% of those image leaks cannot be recovered from text alone.
Content-aware semantic deletion lowers the image residual to 2.0%.
The residual persists across multiple VLMs, a production memory system, and real photographs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Memory systems for multimodal agents will need deletion routines that operate on both text and image representations simultaneously.
Privacy guarantees in deployed agents that store user photos may be weaker than text-only deletion policies suggest.
Benchmarking deletion should include cross-modal leakage tests rather than text-only audits.
Future agent designs could tag images with explicit provenance links so that semantic deletion can target visual content directly.

Load-bearing premise

The measured recovery rates come from implicit visual cues that VLMs actually use at inference time rather than from artifacts of the memory system setup or the image tagging process.

What would settle it

Re-running the MemLeak benchmark on a VLM that has been explicitly trained or prompted to ignore implicit visual cues in retained images and finding zero recovery above the 0.3% FPR baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.29788 by Chao Zhang, Kuan Wang.

**Figure 2.** Figure 2: The IPG taxonomy applied to a measured example. Left: storage-level deletion removes the target fact’s addressable and linked nodes (dashed boundary marks deletion scope); persistent nodes lie outside this scope. Right: a retained fact preserves all nodes; the persistent “implicit visual features” node (red, bold) encodes cross-fact cues that provenance-based deletion cannot reach. Bottom: a VLM reconstruc… view at source ↗

**Figure 3.** Figure 3: Multi-channel leakage decomposition. Retained text (18.3%) and images (12.0%) both [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

When a multimodal AI agent is asked to forget a fact, current memory systems usually delete the text entry and report success. We find that the fact can remain recoverable from retained user images, including images tagged to entirely different facts, because VLMs use implicit visual cues at inference time. We introduce the Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance. The IPG reveals that deletion fails through multiple channels. Our benchmark, MemLeak, measures this across a deletion cascade: direct probing of deletion-capable systems yields <1%, but retained correlated text enables 18.3% recovery, and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR) -- with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%. The residual appears across multiple VLMs, a production memory system, and real Unsplash-licensed photographs. Dual-annotator human validation (kappa = 0.88) confirms judge reliability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemLeak flags that text deletion in multimodal agents leaves facts recoverable from retained images at 12%, with a new IPG taxonomy and benchmark to measure it.

read the letter

The core finding is that deleting text entries does not fully erase facts when images stay in the memory store, because VLMs recover information from visual cues at inference. The paper introduces the Information Provenance Graph taxonomy to classify representations by how easily they can be deleted and builds the MemLeak benchmark to quantify leaks across a deletion cascade.

It does solid empirical work. Retained correlated text yields 18.3% recovery and retained images 12% (0% blind baseline, 0.3% FPR), with 47% of the image-based leaks not recoverable from text. Content-aware semantic deletion brings the image residual down to 2%. The results hold across several VLMs, a production memory system, and real Unsplash photos, and dual-annotator validation reaches kappa 0.88.

The soft spot is the causal step. The abstract states that recovery comes from implicit visual cues, yet the benchmark description does not show an ablation that keeps the tagging and retrieval pipeline fixed while varying only image content. If that control is missing, the 12% figure could partly reflect indexing artifacts rather than cue exploitation alone. The numbers remain interesting either way.

This is for groups working on agent memory, deletion mechanisms, or privacy in multimodal systems. A reader who needs a concrete way to test and reduce these leaks will find the taxonomy and benchmark useful. It deserves peer review because the measurements are specific enough to be checked and the practical implication is clear.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Information Provenance Graph (IPG) taxonomy classifying memory representations by deletion affordance and the MemLeak benchmark to measure information leaks in multimodal agent memory. It reports that retained images enable 12.0% recovery of deleted facts (0.0% blind baseline, 0.3% FPR), with 47% of image leaks not text-recoverable, across a deletion cascade; content-aware semantic deletion reduces the residual to 2.0%. The residual is observed across multiple VLMs, a production memory system, and real Unsplash images, with dual-annotator validation (kappa=0.88).

Significance. If the results hold after addressing isolation concerns, the work identifies a concrete privacy vulnerability in multimodal agents that current text-only deletion mechanisms do not address, with direct implications for agent design. Strengths include explicit baselines and FPR reporting, cross-VLM and real-image evaluation, and reproducible human validation protocol.

major comments (2)

[Abstract] Abstract: The central claim that 'VLMs use implicit visual cues at inference time' to recover facts from retained images is not isolated from potential artifacts of the IPG taxonomy implementation, image-tagging pipeline, or deletion-cascade retrieval logic. No ablation is described that holds tagging and retrieval fixed while varying only visual content, which is required to support the causal attribution for the 12.0% recovery rate.
[Abstract] Abstract / benchmark description: The reported recovery rates (12.0% image, 18.3% correlated text, 2.0% post-semantic deletion) and human validation (kappa=0.88) are presented without sufficient detail on dataset construction, image selection criteria, or how facts are mapped to images in the IPG, preventing assessment of whether post-hoc choices affect the measurements.

minor comments (1)

[Abstract] The abstract introduces 'content-aware semantic deletion' without a concise definition or reference to its implementation details in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the isolation of visual effects and the need for additional methodological detail. We address both points below and will revise the manuscript accordingly to strengthen the causal claims and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'VLMs use implicit visual cues at inference time' to recover facts from retained images is not isolated from potential artifacts of the IPG taxonomy implementation, image-tagging pipeline, or deletion-cascade retrieval logic. No ablation is described that holds tagging and retrieval fixed while varying only visual content, which is required to support the causal attribution for the 12.0% recovery rate.

Authors: We agree that an explicit ablation holding the IPG taxonomy, tagging pipeline, and retrieval logic fixed while varying only visual content would provide stronger causal evidence for attributing the 12.0% recovery to implicit visual cues. The current design compares against a blind baseline (0.0%) and reports that 47% of image leaks are not text-recoverable, but these do not fully isolate visual content from pipeline artifacts. We will add this ablation in the revised manuscript (Section 4) by substituting neutral or text-only images under fixed tagging/retrieval conditions. revision: yes
Referee: [Abstract] Abstract / benchmark description: The reported recovery rates (12.0% image, 18.3% correlated text, 2.0% post-semantic deletion) and human validation (kappa=0.88) are presented without sufficient detail on dataset construction, image selection criteria, or how facts are mapped to images in the IPG, preventing assessment of whether post-hoc choices affect the measurements.

Authors: We will expand the methods and benchmark sections with explicit details on dataset construction, including Unsplash image selection criteria (e.g., licensing, diversity filters), the fact-to-image mapping process in the IPG (one-to-many relations, tagging rules), and summary statistics (e.g., images per fact, total facts). These will appear in the main text and a new appendix to allow assessment of post-hoc choices. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmark with explicit baselines and no derivations

full rationale

The paper reports empirical measurements of information recovery rates in a multimodal memory benchmark (MemLeak) against stated baselines (0.0% blind, 0.3% FPR) and controls (semantic deletion, dual-annotator validation with kappa=0.88). No equations, first-principles derivations, or predictions are present that could reduce to fitted inputs or self-definitions. The IPG is introduced as a descriptive taxonomy classifying deletion affordances, not derived from prior results. Central claims rest on direct experimental outcomes across VLMs and real images rather than any self-referential chain or renamed known result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the domain assumption that VLMs exploit implicit visual cues from retained images; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption VLMs use implicit visual cues at inference time to recover facts from images even when text is deleted
Directly invoked in abstract as the mechanism enabling image-based leaks.

pith-pipeline@v0.9.1-grok · 5717 in / 1257 out tokens · 39753 ms · 2026-06-30T07:23:42.123781+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 9 canonical work pages · 4 internal anchors

[1]

Locating and Editing Factual Associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle=. Locating and Editing Factual Associations in
[2]

ICLR , year=

Mass-Editing Memory in a Transformer , author=. ICLR , year=
[3]

2025 , url=

Mem0: Memory Layer for. 2025 , url=

2025
[4]

2025 , journal=

Zep: A Temporal Knowledge Graph Architecture for Agent Memory , author=. 2025 , journal=

2025
[5]

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , year=
[6]

TMLR , year=

Cognitive Architectures for Language Agents , author=. TMLR , year=
[7]

Interpreting

Gandelsman, Yossi and Efros, Alexei A and Steinhardt, Jacob , booktitle=. Interpreting
[8]

ICML , year=

Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models , author=. ICML , year=
[9]

PoPETS , year=

Mission: Impossible---Image-Based Geolocation with Large Vision Language Models , author=. PoPETS , year=
[10]

Securing

Costa, Manuel and Kopf, Boris and others , year=. Securing
[11]

Hu, Yuanzhe and Wang, Yu and McAuley, Julian , booktitle=
[12]

AAAI , year=

An Information Theoretic Evaluation Metric for Strong Unlearning , author=. AAAI , year=
[13]

Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary C and Kolter, J Zico , booktitle=
[14]

Shi, Weijia and Lee, Jaechan and Huang, Yangsibo and others , booktitle=
[15]

Wang, Chengye and Li, Yuyuan and others , booktitle=
[16]

Liu, Zheyuan and others , booktitle=
[17]

Dontsov, Alexey and others , booktitle=
[18]

Unlearning Sensitive Information in Multimodal

Patil, Vaidehi and Sung, Yi-Lin and others , year=. Unlearning Sensitive Information in Multimodal
[19]

NeurIPS , year=

Private Attribute Inference from Images with Vision-Language Models , author=. NeurIPS , year=
[20]

Collaborative Memory: Multi-User Memory Sharing in

Rezazadeh, Ali and others , booktitle=. Collaborative Memory: Multi-User Memory Sharing in
[21]

Zhang, Ziyang and others , year=
[22]

AAAI , year=

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models , author=. AAAI , year=
[23]

Bei, Yuanchen and others , year=
[24]

2025 , journal=

Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents , author=. 2025 , journal=

2025
[25]

Position:

Thaker, Pratiksha and Hu, Shengyuan and others , booktitle=. Position:
[26]

Agentic Unlearning: When

Wang, Bin and Wang, Fan and Wang, Pingping and Cong, Jinyu and Yu, Yang and Yin, Yilong and Han, Zhongyi and Wei, Benzheng , year=. Agentic Unlearning: When
[27]

IEEE Symposium on Security and Privacy (SP) , year=

Machine unlearning , author=. IEEE Symposium on Security and Privacy (SP) , year=
[28]

Selvas-Sala, Cai and Kang, Lei and Gomez, Lluis , booktitle=
[29]

Li, Zhangheng and Hong, Junyuan and Zhu, Jianing and Eum, Sungmin and Hu, Shuowen and You, Suya and Wang, Zhangyang , booktitle=
[30]

ICLR , year=

Machine Unlearning under Retain-Forget Entanglement , author=. ICLR , year=
[31]

Ghost in the Agent: Redefining Information Flow Tracking for

Cai, Yuandao and Tang, Wensheng and Wen, Cheng and Qin, Shengchao , year=. Ghost in the Agent: Redefining Information Flow Tracking for
[32]

Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with

Tsaprazlis, Efthymios and Feng, Tiantian and Ramakrishna, Anil and Karimireddy, Sai Praneeth and Gupta, Rahul and Narayanan, Shrikanth , year=. Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with
[33]

Graph-Native Cognitive Memory for

Park, Young Bin , year=. Graph-Native Cognitive Memory for
[34]

Bodea, Andreea-Elena and Meisenbacher, Stephen and Klymenko, Alexandra and Matthes, Florian , booktitle=
[35]

Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents

Saad Alqithami. Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents. arXiv:2512.12856, 2025

work page arXiv 2025
[36]

Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents

Yuanchen Bei et al. Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents. arXiv, 2026

2026
[37]

SoK : Privacy risks and mitigations in retrieval-augmented generation systems

Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. SoK : Privacy risks and mitigations in retrieval-augmented generation systems. In IEEE SaTML, 2026

2026
[38]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In IEEE Symposium on Security and Privacy (SP), 2021

2021
[39]

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

Yuandao Cai, Wensheng Tang, Cheng Wen, and Shengchao Qin. Ghost in the agent: Redefining information flow tracking for LLM agents. arXiv:2604.23374, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[40]

Machine unlearning under retain-forget entanglement

Jingpu Cheng, Ping Liu, Qianxiao Li, and Chi Zhang. Machine unlearning under retain-forget entanglement. In ICLR, 2026

2026
[41]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Kopf, et al. Securing AI agents with information-flow control. arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

CLEAR : Character unlearning in textual and visual modalities

Alexey Dontsov et al. CLEAR : Character unlearning in textual and visual modalities. In ACL Findings, 2025

2025
[43]

Interpreting CLIP 's image representation via text-based decomposition

Yossi Gandelsman, Alexei A Efros, and Jacob Steinhardt. Interpreting CLIP 's image representation via text-based decomposition. In ICLR, 2024

2024
[44]

MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions

Yuanzhe Hu, Yu Wang, and Julian McAuley. MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions. In ICLR, 2026

2026
[45]

An information theoretic evaluation metric for strong unlearning

Sangwon Jeon, Dongkeun Jeung, Wonjun Kim, Albert No, and Jungseul Choi. An information theoretic evaluation metric for strong unlearning. In AAAI, 2026

2026
[46]

Watch out your album! on the inadvertent privacy memorization in multi-modal large language models

Haoran Ju, Wenjie Hua, Hao Fei, Zecheng Shao, Yuren Zheng, Zhuotong Zhao, Roy Ka-Wei Lee, Bo-Hsun Hsu, Ruiqi Zhang, and Bryan Hooi Liu. Watch out your album! on the inadvertent privacy memorization in multi-modal large language models. In ICML, 2025

2025
[47]

Cross-modal unlearning via influential neuron path editing in multimodal large language models

Kunhao Li, Wenhao Li, et al. Cross-modal unlearning via influential neuron path editing in multimodal large language models. In AAAI, 2026 a

2026
[48]

POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks

Zhangheng Li, Junyuan Hong, Jianing Zhu, Sungmin Eum, Shuowen Hu, Suya You, and Zhangyang Wang. POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks. In ICLR, 2026 b

2026
[49]

Mission: Impossible---image-based geolocation with large vision language models

Jian Liu, Weidi Deng, et al. Mission: Impossible---image-based geolocation with large vision language models. In PoPETS, 2025 a

2025
[50]

MLLMU-Bench : Protecting privacy in multimodal large language models

Zheyuan Liu et al. MLLMU-Bench : Protecting privacy in multimodal large language models. In NAACL, 2025 b

2025
[51]

TOFU : A task of fictitious unlearning for LLMs

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. TOFU : A task of fictitious unlearning for LLMs . In COLM, 2024

2024
[52]

Mem0: Memory layer for AI agents, 2025

Mem0 AI . Mem0: Memory layer for AI agents, 2025. URL https://github.com/mem0ai/mem0

2025
[53]

MemOS : An operating system for memory-augmented generation

MemTensor . MemOS : An operating system for memory-augmented generation. arXiv:2505.22101, 2025

work page arXiv 2025
[54]

Locating and editing factual associations in GPT

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT . In NeurIPS, 2022

2022
[55]

Mass-editing memory in a transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. In ICLR, 2023

2023
[56]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G Patil, Ion Stoica, and Joseph E Gonzalez. MemGPT : Towards LLMs as operating systems, 2024. arXiv:2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2024
[57]

Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures

Young Bin Park. Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures. arXiv:2603.17244, 2026

work page arXiv 2026
[58]

Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation

Vaidehi Patil, Yi-Lin Sung, et al. Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation. TMLR, 2024

2024
[59]

Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control

Ali Rezazadeh et al. Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control. In ICML, 2025

2025
[60]

SALMUBench : A benchmark for sensitive association-level multimodal unlearning

Cai Selvas-Sala, Lei Kang, and Lluis Gomez. SALMUBench : A benchmark for sensitive association-level multimodal unlearning. In CVPR, 2026

2026
[61]

MUSE : Machine unlearning six-way evaluation for language models

Weijia Shi, Jaechan Lee, Yangsibo Huang, et al. MUSE : Machine unlearning six-way evaluation for language models. In ICLR, 2025

2025
[62]

Cognitive architectures for language agents

Theodore R Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. TMLR, 2024

2024
[63]

Position: LLM unlearning benchmarks are weak measures of progress

Pratiksha Thaker, Shengyuan Hu, et al. Position: LLM unlearning benchmarks are weak measures of progress. In SaTML, 2025

2025
[64]

Private attribute inference from images with vision-language models

Arda Toemekce et al. Private attribute inference from images with vision-language models. In NeurIPS, 2024

2024
[65]

Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs

Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Sai Praneeth Karimireddy, Rahul Gupta, and Shrikanth Narayanan. Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs . arXiv:2603.21573, 2026

work page arXiv 2026
[66]

Agentic unlearning: When LLM agent meets machine unlearning

Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, and Benzheng Wei. Agentic unlearning: When LLM agent meets machine unlearning. arXiv:2602.17692, 2026

work page arXiv 2026
[67]

UMU-Bench : Closing the modality gap in multimodal unlearning evaluation

Chengye Wang, Yuyuan Li, et al. UMU-Bench : Closing the modality gap in multimodal unlearning evaluation. In NeurIPS Datasets & Benchmarks, 2025

2025
[68]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Zep AI . Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[69]

MMDU-Bench : Multi-modal deep unlearning benchmark

Ziyang Zhang et al. MMDU-Bench : Multi-modal deep unlearning benchmark. Under review, 2025

2025

[1] [1]

Locating and Editing Factual Associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle=. Locating and Editing Factual Associations in

[2] [2]

ICLR , year=

Mass-Editing Memory in a Transformer , author=. ICLR , year=

[3] [3]

2025 , url=

Mem0: Memory Layer for. 2025 , url=

2025

[4] [4]

2025 , journal=

Zep: A Temporal Knowledge Graph Architecture for Agent Memory , author=. 2025 , journal=

2025

[5] [5]

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , year=

[6] [6]

TMLR , year=

Cognitive Architectures for Language Agents , author=. TMLR , year=

[7] [7]

Interpreting

Gandelsman, Yossi and Efros, Alexei A and Steinhardt, Jacob , booktitle=. Interpreting

[8] [8]

ICML , year=

Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models , author=. ICML , year=

[9] [9]

PoPETS , year=

Mission: Impossible---Image-Based Geolocation with Large Vision Language Models , author=. PoPETS , year=

[10] [10]

Securing

Costa, Manuel and Kopf, Boris and others , year=. Securing

[11] [11]

Hu, Yuanzhe and Wang, Yu and McAuley, Julian , booktitle=

[12] [12]

AAAI , year=

An Information Theoretic Evaluation Metric for Strong Unlearning , author=. AAAI , year=

[13] [13]

Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary C and Kolter, J Zico , booktitle=

[14] [14]

Shi, Weijia and Lee, Jaechan and Huang, Yangsibo and others , booktitle=

[15] [15]

Wang, Chengye and Li, Yuyuan and others , booktitle=

[16] [16]

Liu, Zheyuan and others , booktitle=

[17] [17]

Dontsov, Alexey and others , booktitle=

[18] [18]

Unlearning Sensitive Information in Multimodal

Patil, Vaidehi and Sung, Yi-Lin and others , year=. Unlearning Sensitive Information in Multimodal

[19] [19]

NeurIPS , year=

Private Attribute Inference from Images with Vision-Language Models , author=. NeurIPS , year=

[20] [20]

Collaborative Memory: Multi-User Memory Sharing in

Rezazadeh, Ali and others , booktitle=. Collaborative Memory: Multi-User Memory Sharing in

[21] [21]

Zhang, Ziyang and others , year=

[22] [22]

AAAI , year=

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models , author=. AAAI , year=

[23] [23]

Bei, Yuanchen and others , year=

[24] [24]

2025 , journal=

Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents , author=. 2025 , journal=

2025

[25] [25]

Position:

Thaker, Pratiksha and Hu, Shengyuan and others , booktitle=. Position:

[26] [26]

Agentic Unlearning: When

Wang, Bin and Wang, Fan and Wang, Pingping and Cong, Jinyu and Yu, Yang and Yin, Yilong and Han, Zhongyi and Wei, Benzheng , year=. Agentic Unlearning: When

[27] [27]

IEEE Symposium on Security and Privacy (SP) , year=

Machine unlearning , author=. IEEE Symposium on Security and Privacy (SP) , year=

[28] [28]

Selvas-Sala, Cai and Kang, Lei and Gomez, Lluis , booktitle=

[29] [29]

Li, Zhangheng and Hong, Junyuan and Zhu, Jianing and Eum, Sungmin and Hu, Shuowen and You, Suya and Wang, Zhangyang , booktitle=

[30] [30]

ICLR , year=

Machine Unlearning under Retain-Forget Entanglement , author=. ICLR , year=

[31] [31]

Ghost in the Agent: Redefining Information Flow Tracking for

Cai, Yuandao and Tang, Wensheng and Wen, Cheng and Qin, Shengchao , year=. Ghost in the Agent: Redefining Information Flow Tracking for

[32] [32]

Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with

Tsaprazlis, Efthymios and Feng, Tiantian and Ramakrishna, Anil and Karimireddy, Sai Praneeth and Gupta, Rahul and Narayanan, Shrikanth , year=. Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with

[33] [33]

Graph-Native Cognitive Memory for

Park, Young Bin , year=. Graph-Native Cognitive Memory for

[34] [34]

Bodea, Andreea-Elena and Meisenbacher, Stephen and Klymenko, Alexandra and Matthes, Florian , booktitle=

[35] [35]

Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents

Saad Alqithami. Forgetful but faithful: A cognitive memory architecture and benchmark for privacy-aware generative agents. arXiv:2512.12856, 2025

work page arXiv 2025

[36] [36]

Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents

Yuanchen Bei et al. Mem-Gallery : Benchmarking multimodal long-term conversational memory for MLLM agents. arXiv, 2026

2026

[37] [37]

SoK : Privacy risks and mitigations in retrieval-augmented generation systems

Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. SoK : Privacy risks and mitigations in retrieval-augmented generation systems. In IEEE SaTML, 2026

2026

[38] [38]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In IEEE Symposium on Security and Privacy (SP), 2021

2021

[39] [39]

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

Yuandao Cai, Wensheng Tang, Cheng Wen, and Shengchao Qin. Ghost in the agent: Redefining information flow tracking for LLM agents. arXiv:2604.23374, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[40] [40]

Machine unlearning under retain-forget entanglement

Jingpu Cheng, Ping Liu, Qianxiao Li, and Chi Zhang. Machine unlearning under retain-forget entanglement. In ICLR, 2026

2026

[41] [41]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Kopf, et al. Securing AI agents with information-flow control. arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

CLEAR : Character unlearning in textual and visual modalities

Alexey Dontsov et al. CLEAR : Character unlearning in textual and visual modalities. In ACL Findings, 2025

2025

[43] [43]

Interpreting CLIP 's image representation via text-based decomposition

Yossi Gandelsman, Alexei A Efros, and Jacob Steinhardt. Interpreting CLIP 's image representation via text-based decomposition. In ICLR, 2024

2024

[44] [44]

MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions

Yuanzhe Hu, Yu Wang, and Julian McAuley. MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions. In ICLR, 2026

2026

[45] [45]

An information theoretic evaluation metric for strong unlearning

Sangwon Jeon, Dongkeun Jeung, Wonjun Kim, Albert No, and Jungseul Choi. An information theoretic evaluation metric for strong unlearning. In AAAI, 2026

2026

[46] [46]

Watch out your album! on the inadvertent privacy memorization in multi-modal large language models

Haoran Ju, Wenjie Hua, Hao Fei, Zecheng Shao, Yuren Zheng, Zhuotong Zhao, Roy Ka-Wei Lee, Bo-Hsun Hsu, Ruiqi Zhang, and Bryan Hooi Liu. Watch out your album! on the inadvertent privacy memorization in multi-modal large language models. In ICML, 2025

2025

[47] [47]

Cross-modal unlearning via influential neuron path editing in multimodal large language models

Kunhao Li, Wenhao Li, et al. Cross-modal unlearning via influential neuron path editing in multimodal large language models. In AAAI, 2026 a

2026

[48] [48]

POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks

Zhangheng Li, Junyuan Hong, Jianing Zhu, Sungmin Eum, Shuowen Hu, Suya You, and Zhangyang Wang. POPS : Recovering unlearned multi-modality knowledge in MLLMs with fine-tuning and prompt-based attacks. In ICLR, 2026 b

2026

[49] [49]

Mission: Impossible---image-based geolocation with large vision language models

Jian Liu, Weidi Deng, et al. Mission: Impossible---image-based geolocation with large vision language models. In PoPETS, 2025 a

2025

[50] [50]

MLLMU-Bench : Protecting privacy in multimodal large language models

Zheyuan Liu et al. MLLMU-Bench : Protecting privacy in multimodal large language models. In NAACL, 2025 b

2025

[51] [51]

TOFU : A task of fictitious unlearning for LLMs

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. TOFU : A task of fictitious unlearning for LLMs . In COLM, 2024

2024

[52] [52]

Mem0: Memory layer for AI agents, 2025

Mem0 AI . Mem0: Memory layer for AI agents, 2025. URL https://github.com/mem0ai/mem0

2025

[53] [53]

MemOS : An operating system for memory-augmented generation

MemTensor . MemOS : An operating system for memory-augmented generation. arXiv:2505.22101, 2025

work page arXiv 2025

[54] [54]

Locating and editing factual associations in GPT

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT . In NeurIPS, 2022

2022

[55] [55]

Mass-editing memory in a transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. In ICLR, 2023

2023

[56] [56]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G Patil, Ion Stoica, and Joseph E Gonzalez. MemGPT : Towards LLMs as operating systems, 2024. arXiv:2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2024

[57] [57]

Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures

Young Bin Park. Graph-native cognitive memory for AI agents: Formal belief revision semantics for versioned memory architectures. arXiv:2603.17244, 2026

work page arXiv 2026

[58] [58]

Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation

Vaidehi Patil, Yi-Lin Sung, et al. Unlearning sensitive information in multimodal LLMs : Benchmark and attack-defense evaluation. TMLR, 2024

2024

[59] [59]

Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control

Ali Rezazadeh et al. Collaborative memory: Multi-user memory sharing in LLM agents with dynamic access control. In ICML, 2025

2025

[60] [60]

SALMUBench : A benchmark for sensitive association-level multimodal unlearning

Cai Selvas-Sala, Lei Kang, and Lluis Gomez. SALMUBench : A benchmark for sensitive association-level multimodal unlearning. In CVPR, 2026

2026

[61] [61]

MUSE : Machine unlearning six-way evaluation for language models

Weijia Shi, Jaechan Lee, Yangsibo Huang, et al. MUSE : Machine unlearning six-way evaluation for language models. In ICLR, 2025

2025

[62] [62]

Cognitive architectures for language agents

Theodore R Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. TMLR, 2024

2024

[63] [63]

Position: LLM unlearning benchmarks are weak measures of progress

Pratiksha Thaker, Shengyuan Hu, et al. Position: LLM unlearning benchmarks are weak measures of progress. In SaTML, 2025

2025

[64] [64]

Private attribute inference from images with vision-language models

Arda Toemekce et al. Private attribute inference from images with vision-language models. In NeurIPS, 2024

2024

[65] [65]

Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs

Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Sai Praneeth Karimireddy, Rahul Gupta, and Shrikanth Narayanan. Rethinking visual privacy: A compositional privacy risk framework for severity assessment with VLMs . arXiv:2603.21573, 2026

work page arXiv 2026

[66] [66]

Agentic unlearning: When LLM agent meets machine unlearning

Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, and Benzheng Wei. Agentic unlearning: When LLM agent meets machine unlearning. arXiv:2602.17692, 2026

work page arXiv 2026

[67] [67]

UMU-Bench : Closing the modality gap in multimodal unlearning evaluation

Chengye Wang, Yuyuan Li, et al. UMU-Bench : Closing the modality gap in multimodal unlearning evaluation. In NeurIPS Datasets & Benchmarks, 2025

2025

[68] [68]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Zep AI . Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[69] [69]

MMDU-Bench : Multi-modal deep unlearning benchmark

Ziyang Zhang et al. MMDU-Bench : Multi-modal deep unlearning benchmark. Under review, 2025

2025