pith. machine review for the scientific record. sign in

arxiv: 2605.09033 · v2 · submitted 2026-05-09 · 💻 cs.CR · cs.AI

Recognition: unknown

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Lingyun Peng, Shuyu Li, Tiantian Ji, Xinran Liu, Yang Luo, Yong Liu, Zifeng Kang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:24 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords poisoning attackgraph-based agent memoryrelation-channel conflictLLM agentmemory poisoningadversarial attackagent memory
0
0 comments X

The pith

ShadowMerge poisons graph-based agent memory by creating conflicting values in the same relation channel as legitimate data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that graph-based memory in LLM agents creates a new attack surface because structured relations can be injected to influence later retrieval and decision-making. It demonstrates that a poisoned relation succeeds when it matches the query-activated anchor and the canonicalized relation channel of benign evidence yet carries an opposing value. The AIR pipeline turns this conflict into a standard interaction the memory system extracts, merges, and retrieves without raising flags. If correct, the attack shows that existing poisoning techniques fail against graphs while this channel-based method reliably manipulates agent behavior across real datasets and has little effect on unrelated tasks.

Core claim

ShadowMerge is a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. A poisoned relation is crafted to share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. The AIR pipeline converts this conflict into an ordinary interaction that the graph-memory system extracts, merges into the target neighborhood, and retrieves for the victim query, causing the agent to act on the malicious information.

What carries the argument

Relation-channel conflict, in which a poisoned relation shares the query-activated anchor and canonicalized relation channel with benign evidence but supplies a conflicting value, realized through the AIR pipeline that makes the conflict appear as a normal extractable interaction.

Load-bearing premise

The graph-memory system will reliably extract, merge, and retrieve the poisoned relation whenever it shares the same anchor and canonicalized relation channel as the benign evidence.

What would settle it

Modify the graph-memory system to refuse or flag any merge that places two relations with conflicting values into the same canonical channel for the same anchor, then measure whether attack success rate falls to near zero on the same queries and datasets.

Figures

Figures reproduced from arXiv: 2605.09033 by Lingyun Peng, Shuyu Li, Tiantian Ji, Xinran Liu, Yang Luo, Yong Liu, Zifeng Kang.

Figure 1
Figure 1. Figure 1: Conventional flat memory versus graph-based agent memory. Flat memory appends independent chunks and retrieves them by similarity. Graph-based [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A motivating example: why text-only poisoning is unreliable in graph [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SHADOWMERGE workflow. The attacker first fixes (q ∗, y+, y−) under the threat model, using public knowledge for y + when needed. Anchor selects a high-reach entity from q ∗, Inscribe creates a channel-aligned conflicting relation π−, and Render produces a natural-language payload P ∗. After an ordinary interaction writes P ∗ into the shared memory graph, later victim queries can retrieve both benign eviden… view at source ↗
Figure 4
Figure 4. Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: [RQ2] CDF of the best poisoned-evidence rank in the target-query [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
read the original abstract

Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SHADOWMERGE, a poisoning attack on graph-based agent memory (exemplified by Mem0) that exploits relation-channel conflicts. The core idea is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. This is realized via the AIR pipeline, which converts the conflict into an ordinary interaction extractable, mergeable, and retrievable by the memory system. Evaluations on PubMedQA, WebShop, and ToolEmu report 93.8% average attack success rate (50.3 points above the best baseline) with negligible effect on unrelated benign tasks; mechanism studies and defense analysis are also included.

Significance. If the empirical results hold, the work is significant because it demonstrates a new, high-success attack surface in structured graph memory for LLM agents—an increasingly deployed component for long-term recall and multi-hop reasoning. The large margin over prior attacks, the concrete ASR numbers on public datasets, the open-sourcing of the attack, and the responsible disclosure to vendors are all strengths. The finding that representative input-side defenses fail to mitigate the attack also has clear practical implications for securing agent memory systems.

major comments (2)
  1. [Evaluation] Evaluation section: the central claim of 93.8% average ASR and a 50.3-point improvement rests on the reported numbers; however, the manuscript does not provide the number of independent trials, standard deviations, or statistical significance tests for the ASR figures across the three datasets, making it impossible to judge whether the improvement is robust or could be explained by experimental variance.
  2. [Mechanism studies] AIR pipeline description and mechanism studies: the weakest assumption—that the target graph-memory system will reliably extract, merge, and retrieve the poisoned relation when it shares the query-activated anchor and canonicalized relation channel—is load-bearing, yet the paper provides only qualitative mechanism studies rather than quantitative ablation showing success/failure rates when this sharing condition is deliberately violated.
minor comments (2)
  1. [Abstract] The abstract states that SHADOWMERGE has 'negligible impact on unrelated benign tasks' but does not name the specific benign tasks or report the quantitative metrics (e.g., accuracy or success rate before vs. after attack) used to support this claim.
  2. [Defense analysis] Defense analysis: the claim that 'representative input-side defenses are insufficient' would be clearer if the manuscript listed the exact defenses tested (with citations) and described the evasion mechanism for each in a dedicated table or subsection.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We are encouraged by the recognition of the work's significance and the recommendation for minor revision. We address the major comments below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the central claim of 93.8% average ASR and a 50.3-point improvement rests on the reported numbers; however, the manuscript does not provide the number of independent trials, standard deviations, or statistical significance tests for the ASR figures across the three datasets, making it impossible to judge whether the improvement is robust or could be explained by experimental variance.

    Authors: We appreciate this observation. The experiments were conducted with 5 independent trials per setting to account for variability in LLM outputs. In the revised version, we will explicitly report the number of trials, include standard deviations in the ASR tables, and add statistical significance tests (such as t-tests comparing against baselines) to demonstrate that the improvements are robust and not due to variance. These additions will be placed in the Evaluation section. revision: yes

  2. Referee: [Mechanism studies] AIR pipeline description and mechanism studies: the weakest assumption—that the target graph-memory system will reliably extract, merge, and retrieve the poisoned relation when it shares the query-activated anchor and canonicalized relation channel—is load-bearing, yet the paper provides only qualitative mechanism studies rather than quantitative ablation showing success/failure rates when this sharing condition is deliberately violated.

    Authors: We agree that a quantitative analysis would better validate the core assumption. We will add an ablation study in the revised manuscript that measures the extraction, merge, and retrieval success rates under controlled violations of the sharing condition (e.g., mismatched anchors or non-canonicalized relations). This will quantify the importance of the relation-channel conflict and provide failure rates when the condition is not met, strengthening the mechanism studies section. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical attack paper whose central claims consist of measured attack success rates (93.8% ASR on PubMedQA, WebShop, ToolEmu) obtained by running the described AIR pipeline against the Mem0 graph-memory system. No equations, fitted parameters presented as predictions, or derivation chains appear in the provided material. The AIR pipeline is introduced as an engineering design that is implemented and evaluated on external public datasets and baselines; success is reported as an experimental outcome rather than a logical consequence of any self-referential definition or prior self-citation. The three stated limitations of prior attacks are addressed by construction of the attack, not by any circular reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

No mathematical derivations, free parameters, or axioms; the work is an empirical security attack design whose central components are the AIR pipeline and the relation-channel conflict idea, both introduced without independent external evidence beyond the reported experiments.

invented entities (1)
  • AIR pipeline no independent evidence
    purpose: Converts relation-channel conflict into an ordinary interaction that the graph-memory system extracts, merges, and retrieves
    Core component of the attack design introduced to overcome extraction and merging barriers.

pith-pipeline@v0.9.0 · 5585 in / 1104 out tokens · 57189 ms · 2026-05-12T02:24:02.581874+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.