hub

arXiv preprint arXiv:2510.02373 , year=

· 2025 · arXiv 2510.02373

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

cs.CR · 2026-06-10 · unverdicted · novelty 7.0

SMSR is the first defense with a certified robustness bound against multi-session memory poisoning in persistent LLM agents, combining HMAC provenance signing with randomized ablation and verdict-based voting.

MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

cs.CR · 2026-05-24 · unverdicted · novelty 7.0

MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

cs.CR · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

Xcientist externalizes research synthesis and validation in AI scientists via contract-governed artifacts to maintain traceable trajectories and avoid claim drift across three domains.

Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

cs.AI · 2026-06-04 · conditional · novelty 6.0

Relabeling an identical erroneous claim from the model's own thought role to an external chat role increases explicit correction rates by 23-93 percentage points across 13 model-domain cells, indicating a chat-template artifact rather than a cognitive deficit.

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

cs.CR · 2026-06-04 · unverdicted · novelty 6.0

A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.

MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

MemAudit combines counterfactual causal influence scores with memory consistency graphs to identify poisoned records in LLM agent memory, reducing MINJA attack success from 70% to 0% in QA and 83.3% to 0% in reasoning tasks.

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

cs.CR · 2026-05-14 · conditional · novelty 6.0

MemLineage enforces untrusted-path persistence in LLM agent memory through Merkle logs, per-principal signatures, and max-of-strong-edges lineage propagation, achieving zero ASR on three poisoning workloads with sub-millisecond overhead.

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

cs.CR · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

cs.CR · 2026-04-27 · conditional · novelty 6.0

AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.

ElephantAgent: Contextual State Continuity in Agentic Systems

cs.AI · 2026-07-02 · unverdicted · novelty 5.0

ElephantAgent maintains a linearizable ledger of contextual state transitions via replicated trusted hardware and adds historical traceability for post-hoc recovery from semantic abuse in agentic systems.

From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents

cs.CR · 2026-06-03 · unverdicted · novelty 5.0 · 2 refs

This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

cs.CR · 2026-05-12 · unverdicted · novelty 5.0

Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.

Ghost in the Context: Policy-Carriage Integrity in LLM Agents

cs.CR · 2026-05-02 · unverdicted · novelty 5.0 · 3 refs

Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

cs.AI · 2026-05-25 · unverdicted · novelty 2.0

A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

citing papers explorer

Showing 13 of 13 citing papers after filters.

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems cs.CR · 2026-06-10 · unverdicted · none · ref 8
SMSR is the first defense with a certified robustness bound against multi-session memory poisoning in persistent LLM agents, combining HMAC provenance signing with randomized ablation and verdict-based voting.
MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems cs.CR · 2026-05-24 · unverdicted · none · ref 17
MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.
MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents cs.CR · 2026-05-05 · unverdicted · none · ref 10 · 2 links
MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.
Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval cs.CR · 2026-06-10 · unverdicted · none · ref 6
Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense cs.CR · 2026-06-04 · unverdicted · none · ref 4
A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.
OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences cs.CR · 2026-05-18 · unverdicted · none · ref 34
OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.
MemLineage: Lineage-Guided Enforcement for LLM Agent Memory cs.CR · 2026-05-14 · conditional · none · ref 21
MemLineage enforces untrusted-path persistence in LLM agent memory through Merkle logs, per-principal signatures, and max-of-strong-edges lineage propagation, achieving zero ASR on three poisoning workloads with sub-millisecond overhead.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration cs.CR · 2026-05-03 · unverdicted · none · ref 86 · 2 links
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents cs.CR · 2026-04-27 · conditional · none · ref 12
AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.
From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents cs.CR · 2026-06-03 · unverdicted · none · ref 76 · 2 links
This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.
The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems cs.CR · 2026-05-12 · unverdicted · none · ref 33
Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.
Ghost in the Context: Policy-Carriage Integrity in LLM Agents cs.CR · 2026-05-02 · unverdicted · none · ref 31 · 3 links
Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 197
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

arXiv preprint arXiv:2510.02373 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer