pith. sign in
Pith Number

pith:IPIU45KI

pith:2025:IPIU45KI5Y5THLIDAAODW5VMD2
not attested not anchored not stored refs resolved

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Julian McAuley, Yuanzhe Hu, Yu Wang

A new benchmark shows current LLM memory agents fall short on four core competencies from cognitive science.

arxiv:2507.05257 v3 · 2025-07-07 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IPIU45KI5Y5THLIDAAODW5VMD2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Empirical results reveal that current methods fall short of mastering all four competencies, underscoring the need for further research into comprehensive memory mechanisms for LLM agents.

C2weakest assumption

That the four competencies drawn from memory science are the complete and essential set for memory agents, and that transforming static long-context datasets into incremental multi-turn interactions preserves the original properties needed to measure those competencies.

C3one line summary

MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.

References

63 extracted · 63 resolved · 15 Pith anchors

[1] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding · arXiv:2308.14508
[2] LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks · arXiv:2412.15204
[3] arXiv preprint arXiv:2405.00200 , year=
[4] Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory 2020 · doi:10.18653/v1/2020.nlp4convai-1.5
[5] 11 Published as a conference paper at ICLR 2026 DeepMind. Gemini pro, 2026

Cited by

28 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.539410Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

43d14e7548ee3b33ad03001c3b76ac1e8913fe2aa03e3d2c7b29b25761351ca7

Aliases

arxiv: 2507.05257 · arxiv_version: 2507.05257v3 · doi: 10.48550/arxiv.2507.05257 · pith_short_12: IPIU45KI5Y5T · pith_short_16: IPIU45KI5Y5THLID · pith_short_8: IPIU45KI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IPIU45KI5Y5THLIDAAODW5VMD2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 43d14e7548ee3b33ad03001c3b76ac1e8913fe2aa03e3d2c7b29b25761351ca7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d5575774f38f003f816bf127894567467356de2653671037fbbfcbeba78a730e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-07-07T17:59:54Z",
    "title_canon_sha256": "a140b706cb55ff33ce6a93ec468408a531bbaab950f09d3b67bb9b418811dac5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2507.05257",
    "kind": "arxiv",
    "version": 3
  }
}