Pith Number

pith:IPIU45KI

pith:2025:IPIU45KI5Y5THLIDAAODW5VMD2

not attested not anchored not stored refs resolved

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Julian McAuley, Yuanzhe Hu, Yu Wang

A new benchmark shows current LLM memory agents fall short on four core competencies from cognitive science.

arxiv:2507.05257 v3 · 2025-07-07 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{IPIU45KI5Y5THLIDAAODW5VMD2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Empirical results reveal that current methods fall short of mastering all four competencies, underscoring the need for further research into comprehensive memory mechanisms for LLM agents.

C2weakest assumption

That the four competencies drawn from memory science are the complete and essential set for memory agents, and that transforming static long-context datasets into incremental multi-turn interactions preserves the original properties needed to measure those competencies.

C3one line summary

MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.

References

63 extracted · 63 resolved · 15 Pith anchors

[1] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding · arXiv:2308.14508

[2] LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks · arXiv:2412.15204

[3] arXiv preprint arXiv:2405.00200 , year=

[4] Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory 2020 · doi:10.18653/v1/2020.nlp4convai-1.5

[5] 11 Published as a conference paper at ICLR 2026 DeepMind. Gemini pro, 2026

Cited by

28 papers in Pith

MemGym: a Long-Horizon Memory Environment for LLM Agents

MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Receipt and verification

First computed	2026-05-17T23:38:46.539410Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

43d14e7548ee3b33ad03001c3b76ac1e8913fe2aa03e3d2c7b29b25761351ca7

Aliases

arxiv: 2507.05257 · arxiv_version: 2507.05257v3 · doi: 10.48550/arxiv.2507.05257 · pith_short_12: IPIU45KI5Y5T · pith_short_16: IPIU45KI5Y5THLID · pith_short_8: IPIU45KI

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/IPIU45KI5Y5THLIDAAODW5VMD2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 43d14e7548ee3b33ad03001c3b76ac1e8913fe2aa03e3d2c7b29b25761351ca7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d5575774f38f003f816bf127894567467356de2653671037fbbfcbeba78a730e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-07-07T17:59:54Z",
    "title_canon_sha256": "a140b706cb55ff33ce6a93ec468408a531bbaab950f09d3b67bb9b418811dac5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2507.05257",
    "kind": "arxiv",
    "version": 3
  }
}