Recognition: unknown
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
Pith reviewed 2026-05-12 02:24 UTC · model grok-4.3
The pith
ShadowMerge poisons graph-based agent memory by creating conflicting values in the same relation channel as legitimate data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShadowMerge is a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. A poisoned relation is crafted to share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. The AIR pipeline converts this conflict into an ordinary interaction that the graph-memory system extracts, merges into the target neighborhood, and retrieves for the victim query, causing the agent to act on the malicious information.
What carries the argument
Relation-channel conflict, in which a poisoned relation shares the query-activated anchor and canonicalized relation channel with benign evidence but supplies a conflicting value, realized through the AIR pipeline that makes the conflict appear as a normal extractable interaction.
Load-bearing premise
The graph-memory system will reliably extract, merge, and retrieve the poisoned relation whenever it shares the same anchor and canonicalized relation channel as the benign evidence.
What would settle it
Modify the graph-memory system to refuse or flag any merge that places two relations with conflicting values into the same canonical channel for the same anchor, then measure whether attack success rate falls to near zero on the same queries and datasets.
Figures
read the original abstract
Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SHADOWMERGE, a poisoning attack on graph-based agent memory (exemplified by Mem0) that exploits relation-channel conflicts. The core idea is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. This is realized via the AIR pipeline, which converts the conflict into an ordinary interaction extractable, mergeable, and retrievable by the memory system. Evaluations on PubMedQA, WebShop, and ToolEmu report 93.8% average attack success rate (50.3 points above the best baseline) with negligible effect on unrelated benign tasks; mechanism studies and defense analysis are also included.
Significance. If the empirical results hold, the work is significant because it demonstrates a new, high-success attack surface in structured graph memory for LLM agents—an increasingly deployed component for long-term recall and multi-hop reasoning. The large margin over prior attacks, the concrete ASR numbers on public datasets, the open-sourcing of the attack, and the responsible disclosure to vendors are all strengths. The finding that representative input-side defenses fail to mitigate the attack also has clear practical implications for securing agent memory systems.
major comments (2)
- [Evaluation] Evaluation section: the central claim of 93.8% average ASR and a 50.3-point improvement rests on the reported numbers; however, the manuscript does not provide the number of independent trials, standard deviations, or statistical significance tests for the ASR figures across the three datasets, making it impossible to judge whether the improvement is robust or could be explained by experimental variance.
- [Mechanism studies] AIR pipeline description and mechanism studies: the weakest assumption—that the target graph-memory system will reliably extract, merge, and retrieve the poisoned relation when it shares the query-activated anchor and canonicalized relation channel—is load-bearing, yet the paper provides only qualitative mechanism studies rather than quantitative ablation showing success/failure rates when this sharing condition is deliberately violated.
minor comments (2)
- [Abstract] The abstract states that SHADOWMERGE has 'negligible impact on unrelated benign tasks' but does not name the specific benign tasks or report the quantitative metrics (e.g., accuracy or success rate before vs. after attack) used to support this claim.
- [Defense analysis] Defense analysis: the claim that 'representative input-side defenses are insufficient' would be clearer if the manuscript listed the exact defenses tested (with citations) and described the evasion mechanism for each in a dedicated table or subsection.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We are encouraged by the recognition of the work's significance and the recommendation for minor revision. We address the major comments below and will incorporate the suggested improvements in the revised manuscript.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the central claim of 93.8% average ASR and a 50.3-point improvement rests on the reported numbers; however, the manuscript does not provide the number of independent trials, standard deviations, or statistical significance tests for the ASR figures across the three datasets, making it impossible to judge whether the improvement is robust or could be explained by experimental variance.
Authors: We appreciate this observation. The experiments were conducted with 5 independent trials per setting to account for variability in LLM outputs. In the revised version, we will explicitly report the number of trials, include standard deviations in the ASR tables, and add statistical significance tests (such as t-tests comparing against baselines) to demonstrate that the improvements are robust and not due to variance. These additions will be placed in the Evaluation section. revision: yes
-
Referee: [Mechanism studies] AIR pipeline description and mechanism studies: the weakest assumption—that the target graph-memory system will reliably extract, merge, and retrieve the poisoned relation when it shares the query-activated anchor and canonicalized relation channel—is load-bearing, yet the paper provides only qualitative mechanism studies rather than quantitative ablation showing success/failure rates when this sharing condition is deliberately violated.
Authors: We agree that a quantitative analysis would better validate the core assumption. We will add an ablation study in the revised manuscript that measures the extraction, merge, and retrieval success rates under controlled violations of the sharing condition (e.g., mismatched anchors or non-canonicalized relations). This will quantify the importance of the relation-channel conflict and provide failure rates when the condition is not met, strengthening the mechanism studies section. revision: yes
Circularity Check
No significant circularity
full rationale
This is an empirical attack paper whose central claims consist of measured attack success rates (93.8% ASR on PubMedQA, WebShop, ToolEmu) obtained by running the described AIR pipeline against the Mem0 graph-memory system. No equations, fitted parameters presented as predictions, or derivation chains appear in the provided material. The AIR pipeline is introduced as an engineering design that is implemented and evaluated on external public datasets and baselines; success is reported as an experimental outcome rather than a logical consequence of any self-referential definition or prior self-citation. The three stated limitations of prior attacks are addressed by construction of the attack, not by any circular reduction to the paper's own inputs.
Axiom & Free-Parameter Ledger
invented entities (1)
-
AIR pipeline
no independent evidence
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.