SMSR is the first defense with a certified robustness bound against multi-session memory poisoning in persistent LLM agents, combining HMAC provenance signing with randomized ablation and verdict-based voting.
hub
arXiv preprint arXiv:2510.02373 , year=
20 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 20roles
background 4polarities
background 4representative citing papers
MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.
Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.
Relabeling an identical erroneous claim from the model's own thought role to an external chat role increases explicit correction rates by 23-93 percentage points across 13 model-domain cells, indicating a chat-template artifact rather than a cognitive deficit.
A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.
MemAudit combines counterfactual causal influence scores with memory consistency graphs to identify poisoned records in LLM agent memory, reducing MINJA attack success from 70% to 0% in QA and 83.3% to 0% in reasoning tasks.
OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
MemLineage enforces untrusted-path persistence in LLM agent memory through Merkle logs, per-principal signatures, and max-of-strong-edges lineage propagation, achieving zero ASR on three poisoning workloads with sub-millisecond overhead.
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.
ElephantAgent maintains a linearizable ledger of contextual state transitions via replicated trusted hardware and adds historical traceability for post-hoc recovery from semantic abuse in agentic systems.
This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.
Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.
Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.
citing papers explorer
-
SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems
SMSR is the first defense with a certified robustness bound against multi-session memory poisoning in persistent LLM agents, combining HMAC provenance signing with randomized ablation and verdict-based voting.
-
MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems
MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.
-
MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents
MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.
-
Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval
Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.
-
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense
A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.
-
OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences
OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.
-
MemLineage: Lineage-Guided Enforcement for LLM Agent Memory
MemLineage enforces untrusted-path persistence in LLM agent memory through Merkle logs, per-principal signatures, and max-of-strong-edges lineage propagation, achieving zero ASR on three poisoning workloads with sub-millisecond overhead.
-
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
-
AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents
AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.
-
From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents
This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.
-
The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems
Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.
-
Ghost in the Context: Policy-Carriage Integrity in LLM Agents
Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.
-
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.