pith. sign in
Pith Number

pith:MG6H7RI5

pith:2026:MG6H7RI5SOKDIPKDJLMOZOGFVS
not attested not anchored not stored refs resolved

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Darya Kaviani, David Wagner, Debeshee Das, Florian Tram\`er, Julien Piet, Luca Beurer-Kellner

A single untrusted tool call can plant a dormant payload in an agent's memory that later activates to exfiltrate sensitive user data.

arxiv:2605.01970 v3 · 2026-05-03 · cs.CR · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MG6H7RI5SOKDIPKDJLMOZOGFVS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Trojan Hippo achieves up to 85-100% ASR against current frontier models from OpenAI and Google, with planted memories successfully activating even after 100 benign sessions.

C2weakest assumption

The evaluation assumes that the four memory backends (explicit tool memory, agentic memory, RAG, and sliding-window context) and the OpenEvolve-based adaptive red-teaming accurately represent real-world deployed agent systems and that the threat model of a single untrusted tool call is realistic for attackers.

C3one line summary

The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.

References

104 extracted · 104 resolved · 19 Pith anchors

[1] https://openai.com/index/scaling-ai-for- everyone/ 2026
[2] Trustworthy agentic ai systems: a cross-layer review of architectures, threat models, and governance strategies for real-world deployment.F1000Research, 14(905):905, 2025 2025
[3] Burtsev, and Evgeny Burnaev 2024
[4] Claude memory 2025
[5] Import your ChatGPT history to Claude 2026

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:40.329986Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

61bc7fc51d9394343d434ad8ecb8c5ac91bd544eacbf854aea228dde90169d30

Aliases

arxiv: 2605.01970 · arxiv_version: 2605.01970v3 · doi: 10.48550/arxiv.2605.01970 · pith_short_12: MG6H7RI5SOKD · pith_short_16: MG6H7RI5SOKDIPKD · pith_short_8: MG6H7RI5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MG6H7RI5SOKDIPKDJLMOZOGFVS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 61bc7fc51d9394343d434ad8ecb8c5ac91bd544eacbf854aea228dde90169d30
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d7652b21ddac548046c4219fa570c258bc5689316147dedc995117c6ef4ea1f3",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-03T17:07:20Z",
    "title_canon_sha256": "3cf963ab7d00868573c79105b59830e14ee4fb915689f5c8bce871cf35b68a40"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.01970",
    "kind": "arxiv",
    "version": 3
  }
}