pith. sign in
Pith Number

pith:YTJJXWC4

pith:2026:YTJJXWC4HXVEG23P7X3BFYPILP
not attested not anchored not stored refs pending

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

Fan Zhang, Junchi Yan, Lizhuang Ma, Qibing Ren, Shaoxiong Guo, Tian Xia, Weiwei Xie, Xue Yang

Biased memory updates cause substantial safety degradation in LLM agents.

arxiv:2604.15774 v2 · 2026-04-17 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YTJJXWC4HXVEG23P7X3BFYPILP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments on representative models reveal substantial safety degradation under biased memory updates. Our analysis suggests that memory evolution is a significant contributor to these failures. Furthermore, static prompt-based defenses prove insufficient.

C2weakest assumption

That the constructed mixed benign and misleading memory pools in multi-round interactions accurately simulate real-world memory evolution and its safety impacts in deployed LLM agents.

C3one line summary

MemEvoBench is the first benchmark for long-horizon memory safety in LLM agents, using QA tasks across 7 domains and 36 risks plus workflow tasks with noisy tools to measure behavioral drift from biased memory updates.

Receipt and verification
First computed 2026-05-22T01:04:02.552031Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c4d29bd85c3dea436b6ffdf612e1e85bf51ab9afe7c39cc7e13ec914e2f4561f

Aliases

arxiv: 2604.15774 · arxiv_version: 2604.15774v2 · doi: 10.48550/arxiv.2604.15774 · pith_short_12: YTJJXWC4HXVE · pith_short_16: YTJJXWC4HXVEG23P · pith_short_8: YTJJXWC4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YTJJXWC4HXVEG23P7X3BFYPILP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c4d29bd85c3dea436b6ffdf612e1e85bf51ab9afe7c39cc7e13ec914e2f4561f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ae9f0c8aee4c5a3d35f4f5fbc67d57b8c10684f2cc7ca1a8bb9be6c130951780",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-04-17T07:29:52Z",
    "title_canon_sha256": "70f246af957e25df222467d9aa1bfeb9451a4ca0ee4974bcecddfbd4c7072abf"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.15774",
    "kind": "arxiv",
    "version": 2
  }
}