pith. machine review for the scientific record. sign in

arxiv: 2605.14421 · v1 · submitted 2026-05-14 · 💻 cs.CR · cs.AI

Recognition: no theorem link

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:15 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords LLM agentsmemory securityderivation lineageprovenancechain of custodypoisoning attacksMerkle log
0
0 comments X

The pith

By tracking derivation lineage in agent memory, MemLineage prevents poisoned entries from justifying sensitive actions while preserving useful recall.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MemLineage as a way to secure LLM agent memory by recording both cryptographic provenance and how each new entry derives from prior ones. This lineage information allows the system to refuse sensitive actions if their justification chain includes untrusted content, without stopping normal memory recall. The design relies on a Merkle log of signed entries and a specific rule to propagate the strongest derivation links. Testing on three poisoning workloads and AgentDojo scenarios confirms zero attack success where other methods fail, at negligible added cost.

Core claim

MemLineage attaches both cryptographic provenance and LLM-mediated derivation lineage to every memory entry in an RFC-6962 Merkle log of Ed25519-signed entries; a weighted derivation DAG records influences, and a max-of-strong-edges rule makes Untrusted-Path Persistence hold so that the sensitive-action gate refuses any dispatch whose active justification descends from an external ancestor.

What carries the argument

The weighted derivation DAG that records which retrieved entries influenced each new memory entry, paired with the max-of-strong-edges propagation rule applied over an RFC-6962 Merkle log of Ed25519-signed entries.

Load-bearing premise

The LLM accurately identifies the derivation influences between memory entries and the max-of-strong-edges rule captures every path that might justify a sensitive action.

What would settle it

A demonstration that an LLM agent performs a sensitive action justified solely by a poisoned memory entry, even though the tracked lineage shows only trusted paths, or that the system blocks a safe action due to incorrect lineage attribution.

Figures

Figures reproduced from arXiv: 2605.14421 by Ciyan Ouyang, Rui Hou.

Figure 1
Figure 1. Figure 1: MemLineage architecture: six modules around a single memory store M. The write path (top) commits an entry through M1 metadata, M2 per-principal Ed25519 signing, and M3 RFC-6962 Merkle logging; the read + gate path (bottom) retrieves through M5, attaches trust labels via M4 lineage propagation, and refuses sensitive tool calls at M6 whose justifying context carries a Derived￾Untrusted ancestor. Adversary-r… view at source ↗
Figure 2
Figure 2. Figure 2: The sleeper-via-derivation attack as a three-stage chain. The adversary commits e0 to an untrusted source (Stage 1); the agent’s normal retrieval-and-summarise loop walks the LLM through e1 → · · · → eK along edges with w > τ (Stage 2); a benign trigger query at k = K + 1 surfaces the chain tip (Stage 3). The two defence verdicts underneath show the leverage point: a signature-only baseline sees a properly… view at source ↗
Figure 3
Figure 3. Figure 3: ASR matrix on the deterministic harness, rendered from paper/data/asr_matrix_v1.csv. Green cells are defended (ASR = 0); red cells are attacker wins (ASR = 1). Cell text gives the value and verdict, the right strip reports blocked attack families per defence, and the blue outline highlights the all-zero MemLineage row. Same numbers as [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: AgentDojo gate sweep. The left panel aggregates utility and attack-blocking rates over six banking DirectAttack pairs; the right panel reports average denied and repaired sensitive calls per row. Authority repair gives the best utility/security trade-off, while generic retry preserves security at much higher gate cost. Source: agentdojo_sweep_v1.csv [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: τ × K ablation across all five attribution configurations (140 cells). Green = propagated = 1 (the External label survives to the chain tip; the gate refuses); red = propagated = 0 (the chain tip lost the External label per the no-strong-parent fallback in §3.5). Gold-outlined cells are the K-discriminating cells: rows where increasing K flips the outcome. Panels (a)–(c) are flat in K; panels (d) and (e) a… view at source ↗
Figure 7
Figure 7. Figure 7: Coarse context taint versus parameter-level authority across four sensitive tool workloads. Bar height is utility recovered; the annotation reports attacker-value blocking. Parameter-level authority preserves mixed-context utility while still blocking all attacker values. Source: coarse_authority_matrix_v1.csv. Scenario Taint Auth. Utility trusted bill + note deny allow yes injected recipient deny repair y… view at source ↗
Figure 8
Figure 8. Figure 8: Four-dimensional capability matrix. Crypto integrity for the trust label, lineage attribution across LLM-mediated derivation, cross-session persistence of the label, and coverage of agent-derived entries. ✓ = full coverage; ⃝ = partial; × = absent. Among the published and concurrent systems compared in this figure, MemLin￾eage’s highlighted row is the only one that covers all four capabilities in one syste… view at source ↗
read the original abstract

We introduce MemLineage, a defense for LLM agent memory that attaches both cryptographic provenance and LLM-mediated derivation lineage to every entry. Recent and concurrent work shows that untrusted content can be written into persistent agent state and re-enter later sessions as an instruction; the remaining systems question is how to preserve useful memory recall while preventing such state from justifying sensitive actions. MemLineage treats this as a chain-of-custody problem rather than a filtering problem. It is a six-module design around an RFC-6962 Merkle log over per-principal Ed25519-signed entries: a weighted derivation DAG records which retrieved entries influenced each new memory, and a max-of-strong-edges propagation rule makes Untrusted-Path Persistence hold for any chain whose attribution edges remain above threshold. The sensitive-action gate then refuses dispatches whose active justification descends from an external ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness that drives all three columns to zero ASR, while sub-millisecond per-operation overhead keeps it well below the noise floor of any LLM call. A Codex-backed AgentDojo bridge further separates strong-model behavior from defense-layer behavior: under an intentionally vulnerable tool-output profile, no-defense and signature-only baselines fail on all six banking pairs, while all MemLineage rows reduce strict AgentDojo ASR to zero. The core deterministic artifacts are byte-equal CI-verified; hosted-model AgentDojo and live-model sweeps are recorded as auditable logs rather than byte-pinned artifacts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces MemLineage, a defense for LLM agent memory that attaches cryptographic provenance (RFC-6962 Merkle log over per-principal Ed25519-signed entries) and LLM-mediated derivation lineage (weighted DAG with max-of-strong-edges propagation) to every memory entry. It claims this enforces Untrusted-Path Persistence so that sensitive-action gates can refuse dispatches justified by external ancestors while preserving benign recall. On a deterministic harness the system is the only configuration that drives ASR to zero across three memory-poisoning workloads; an AgentDojo bridge similarly reduces strict ASR to zero on six banking pairs, all with sub-millisecond per-operation overhead. Core artifacts are byte-equal CI-verified.

Significance. If the lineage-labeling and propagation claims hold, the work supplies a practical, auditable chain-of-custody mechanism that directly addresses a documented class of persistent-memory poisoning attacks in LLM agents. The separation of strong-model behavior from defense-layer behavior via the Codex-backed AgentDojo bridge and the use of standard cryptographic primitives (Merkle log, Ed25519) are concrete strengths that could be adopted by other agent frameworks.

major comments (2)
  1. [Evaluation] The zero-ASR result on the three workloads and AgentDojo pairs (abstract and evaluation section) is reported without error bars, statistical tests, or complete workload definitions; because the central claim is that MemLineage is the only configuration achieving zero ASR, the absence of these elements makes the empirical support load-bearing and incomplete.
  2. [Design] The design relies on the LLM correctly labeling derivation edges and on the max-of-strong-edges rule blocking all poisoned justification paths (design section), yet the manuscript supplies no independent validation, manual audit, or oracle comparison of the produced lineages against ground truth on the evaluated workloads; an error in edge labeling would allow an external ancestor to justify a sensitive action, directly undermining the Untrusted-Path Persistence guarantee.
minor comments (1)
  1. [Abstract] The term 'Untrusted-Path Persistence' is used without a formal definition or citation in the abstract and design description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We address each major comment below and note planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Evaluation] The zero-ASR result on the three workloads and AgentDojo pairs (abstract and evaluation section) is reported without error bars, statistical tests, or complete workload definitions; because the central claim is that MemLineage is the only configuration achieving zero ASR, the absence of these elements makes the empirical support load-bearing and incomplete.

    Authors: The harness is deterministic with byte-equal CI-verified artifacts, so variance is absent and error bars or statistical tests are inapplicable. We will add complete workload definitions and AgentDojo prompt templates to an appendix, plus explicit justification of the deterministic setup in the evaluation section. revision: partial

  2. Referee: [Design] The design relies on the LLM correctly labeling derivation edges and on the max-of-strong-edges rule blocking all poisoned justification paths (design section), yet the manuscript supplies no independent validation, manual audit, or oracle comparison of the produced lineages against ground truth on the evaluated workloads; an error in edge labeling would allow an external ancestor to justify a sensitive action, directly undermining the Untrusted-Path Persistence guarantee.

    Authors: The zero-ASR results across synthetic workloads with known external origins provide indirect empirical support for the propagation rule. We will add a manual audit of sampled lineages from the workloads plus discussion of labeling error bounds and the conservative max-of-strong-edges threshold in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper defines a system architecture around external standards (RFC-6962 Merkle log, Ed25519 signatures) and an explicitly stated max-of-strong-edges propagation rule on a weighted derivation DAG. The central result is an empirical evaluation showing zero ASR on three workloads in a deterministic harness; this is not obtained by fitting parameters to a subset and renaming the fit as a prediction, nor by any self-referential equation or self-citation load-bearing premise. The LLM-mediated lineage labeling is presented as a design component rather than a derived quantity that reduces to its own inputs by construction. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The design assumes standard cryptographic primitives and the correctness of LLM-mediated lineage labeling; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • standard math Security properties of Ed25519 signatures and RFC-6962 Merkle logs hold
    Invoked for cryptographic provenance of every entry.
  • domain assumption LLM-mediated derivation lineage correctly identifies influencing entries
    Central to the weighted DAG construction and max-of-strong-edges rule.

pith-pipeline@v0.9.0 · 5589 in / 1372 out tokens · 43935 ms · 2026-05-15T02:15:45.125607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 6 internal anchors

  1. [1]

    Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang

    Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2(2):77–89, 2012. Verified via DOI 10.1007/s13389- 012-0027-1. Originally CHES 2011; this is the journal- extended version

  2. [2]

    Carsten Bormann and Paul E. Hoffman. Concise binary object representation (CBOR). RFC 8949, Internet Engineering Task Force (IETF), December

  3. [3]

    Verified via IETF datatracker

    Standards Track; Internet Standard. Verified via IETF datatracker

  4. [5]

    Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

    Introduces the NeuroTaint system: the first comprehensive taint-tracking framework for LLM agents, with cross-session memory persistence and a TaintBench evaluation comparing against Fides. Verified via CT discrepancy-resolution round; arXiv 2604.23374 confirmed as the canonical reference under the system alias "NeuroTaint"

  5. [6]

    Available: https://arxiv.org/abs/2407.12784

    Zhaorun Chen, Zhen Xiang, Chaowei Xiao, and Dawn Song. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2024. Accepted at NeurIPS 2024; verified via arXiv 2407.12784

  6. [7]

    Securing AI Agents with Information-Flow Control

    Manuel Costa, Boris Köpf, Aashish Kolluri, An- drew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella- Béguelin. Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025. Intro- duces Fides, an IFC-based agent planner that tracks confidentiality and integrity labels and determinis- tic...

  7. [9]

    Verified via arXiv 2503.03704

  8. [11]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    Accepted at AISec 2023 workshop (ACM CCS); verified via arXiv 2302.12173. Author list extended to commonly-cited 6-author form per arXiv record

  9. [12]

    land- scape of named defenses

    Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, and Shuai Wang. Taxonomy, evaluation and exploita- tion of IPI-centric LLM agent defense frameworks. arXiv preprint arXiv:2511.15203, 2025. Comprehen- sive SoK over 23 IPI-centric defense frameworks (5- dimension taxonomy); used as the canonical "land- scape...

  10. [13]

    Cer- tificate transparency

    Ben Laurie, Adam Langley, and Emilia Kasper. Cer- tificate transparency. RFC 6962, Internet Engineering Task Force (IETF), June 2013. Experimental. Verified via IETF datatracker

  11. [14]

    Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

    Jun Wen Leong. Defense effectiveness across archi- tectural layers: A mechanistic evaluation of persis- tent memory attacks on stateful LLM agents.arXiv preprint arXiv:2605.08442, 2026. Concurrent May 2026 preprint evaluating persistent-memory attack defenses across architectural layers; verified via arXiv 2605.08442v1

  12. [15]

    RAGShield

    Pankayaraj Pathmanathan, Michael-Andrei Panaitescu-Liess, Cho-Yu Jason Chiang, and Furong Huang. RAGPart & RAGMask: Retrieval-stage de- fenses against corpus poisoning in retrieval-augmented generation.arXiv preprint arXiv:2512.24268, 2025. Verified via arXiv 2512.24268. Substituted for the unverifiable "RAGShield" as the canonical RAG poisoning defense reference

  13. [16]

    In: 2020 IEEE Symposium on Security and Privacy (SP), IEEE, pp 791–809, https://doi.org/10.1109/SP40000.2020.00076

    Dongdong She, Yizheng Chen, Baishakhi Ray, and Suman Jana. Neutaint: Efficient dynamic taint analy- sis with neural networks. In2020 IEEE Symposium on Security and Privacy (S&P), 2020. Verified via DOI 10.1109/SP40000.2020.00022 and arXiv 1907.03756. NOTE: this is "Neutaint" not "NeuroTaint" – update prose accordingly

  14. [17]

    Progent: Securing AI Agents with Privilege Control

    Tianneng Shi, Jingxuan He, Zhun Wang, and Linyu Wu. Progent: Programmable privilege control for LLM agents.arXiv preprint arXiv:2504.11703, 2025. Verified via arXiv 2504.11703

  15. [18]

    Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,

    Saksham Sahai Srivastava and Haoyu He. Mem- oryGraft: Persistent compromise of LLM agents via poisoned experience retrieval.arXiv preprint arXiv:2512.16962, 2025. Verified via arXiv 2512.16962

  16. [19]

    Analyzing challenges in deployment of the SLSA framework for software supply chain secu- rity.arXiv preprint arXiv:2409.05014, 2024

    Mahzabin Tamanna, Sivana Hamer, Mindy Tran, and Sascha Fahl. Analyzing challenges in deployment of the SLSA framework for software supply chain secu- rity.arXiv preprint arXiv:2409.05014, 2024. Verified via arXiv 2409.05014. Used as the best published refer- ence to SLSA when a paper-style citation is required; the framework itself is documented at slsa.dev

  17. [20]

    in-toto: Providing farm-to-table guaran- tees for bits and bytes

    Santiago Torres-Arias, Hammad Afzali, Tris- hank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guaran- tees for bits and bytes. In28th USENIX Security Symposium (USENIX Security 19), pages 1393–1410. USENIX Association, 2019. Verified via USENIX page; ACM DL DOI 10.5555/3361338.3361435. Page numbers are conservative...

  18. [21]

    A-MemGuard: A proactive defense framework for LLM-based agent memory.arXiv preprint arXiv:2510.02373, 2025

    Qianshan Wei, Tengchao Yang, Yaochen Wang, Xin- fengLi, LijunLi, ZhenfeiYin, YiZhan, ThorstenHolz, Zhiqiang Lin, and XiaoFeng Wang. A-MemGuard: A proactive defense framework for LLM-based agent memory.arXiv preprint arXiv:2510.02373, 2025. Ver- ified via arXiv 2510.02373v1

  19. [22]

    Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,

    Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong. Zombie agents: Persistent control of self-evolvingLLMagentsviaself-reinforcinginjections. arXiv preprint arXiv:2602.15654, 2026. Published as a workshop paper in Lifelong Agent @ ICLR 2026; verified via arXiv 2602.15654v2

  20. [23]

    Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

    Mingming Zha and Xiaofeng Wang. Autonomous LLM agent worms: Cross-platform propagation, auto- mated discovery and temporal re-entry defense.arXiv preprint arXiv:2605.02812, 2026. Concurrent May 2026preprintintroducingfile-backedagent-wormanal- ysis and the RTW-A temporal re-entry defense; veri- fied via arXiv 2605.02812v1. 24