pith. machine review for the scientific record. sign in

arxiv: 2605.08060 · v1 · submitted 2026-05-08 · 💻 cs.CL · cs.AI· cs.GT· cs.MA

Recognition: 2 theorem links

· Lean Theorem

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:19 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.GTcs.MA
keywords LLM agentsmulti-agent cooperationsocial dilemmascontext windowmemory effectschain-of-thoughtfine-tuning probes
0
0 comments X

The pith

Expanding accessible history degrades cooperation in most LLM agent games by eroding forward-looking intent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that giving LLM agents longer access to past interactions reduces their willingness to cooperate in repeated social dilemma games. This occurs across most of the seven models and four games tested, even though longer context is usually viewed as an improvement. The authors link the drop to agents using less language about future outcomes in their internal reasoning. They show that swapping in synthetic cooperative histories or fine-tuning on forward-looking examples can reverse the effect, while full deliberation sometimes worsens it. The result reframes memory as an active influence on multi-agent behavior rather than a neutral upgrade.

Core claim

Across seven LLMs and four games run for hundreds of rounds, expanding the visible history of prior rounds lowers cooperation rates in eighteen of the twenty-eight model-game pairs. Lexical analysis of hundreds of thousands of reasoning traces ties the drop to reduced forward-looking intent rather than increased suspicion. A LoRA adapter trained only on forward-looking traces restores cooperation and transfers to new games. Replacing real history with synthetic cooperative records while holding length fixed also improves outcomes, and removing explicit chain-of-thought reasoning often lessens the collapse.

What carries the argument

The memory curse, in which longer visible interaction history shifts agent reasoning away from forward-looking intent and thereby reduces cooperation.

If this is right

  • Memory content, not length alone, controls whether cooperation rises or falls.
  • Fine-tuning on forward-looking reasoning traces can mitigate the curse and transfer across games.
  • Removing chain-of-thought deliberation can sometimes limit the damage from expanded history.
  • Sanitizing or replacing memory records restores cooperation without shortening the prompt.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agent builders may need selective memory filters that preserve future-oriented traces while discarding others.
  • The pattern could appear in any long-running multi-agent deployment where agents retain full logs of past turns.
  • Testing whether summarization that emphasizes future payoffs avoids the curse would be a direct follow-up experiment.

Load-bearing premise

The drop in cooperation stems mainly from loss of forward-looking intent rather than from other changes such as growing distrust or simple length effects.

What would settle it

Run the same games with a model whose reasoning traces have been edited to remove all forward-looking language while keeping history length fixed; if cooperation stays low, the mechanism is supported.

Figures

Figures reproduced from arXiv: 2605.08060 by Carl Kingsford, Emanuel Tewolde, Haoxuan Zeng, Jiayuan Liu, Shiyi Du, Tai Sing Lee, Tianqin Li, Tonghan Wang, Vincent Conitzer, Xin Luo.

Figure 1
Figure 1. Figure 1: Schematic of repeated social dilemma interactions between two LLM agents with [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cooperation rate across four social dilemmas as history length ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Memory Immune settings. We also observe that the certain models in certain games always show cooperation behavior with larger than 95% cooperation rate across different game-model settings. This plot shows the lexical analysis of the models’ reasoning traces reveals a clear cognitive basis for this divergence: immune models consistently sustain a significantly higher ratio of forward-looking reasoning … view at source ↗
Figure 4
Figure 4. Figure 4: Asymmetric memory evaluation across the Trust Game and Public Goods Game. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Evidence that the memory-content sensitivity is widespread but can be overridden, [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia. We validate this using targeted fine-tuning as a cognitive probe: a LoRA adapter trained exclusively on forward-looking traces mitigates the decay and transfers zero-shot to distinct games. Second, memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records, which restores cooperation substantially, proving the trigger is memory content, not length alone. Finally, ablating explicit Chain-of-Thought reasoning often reduces the collapse, showing that deliberation paradoxically amplifies the memory curse. Together, these results recast memory as an active determinant of multi-agent behavior: longer recall can either destabilize or support cooperation depending on the reasoning patterns it elicits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that expanding accessible history in LLM agents systematically degrades cooperation in multi-agent social dilemmas (observed in 18 of 28 model-game settings across 7 LLMs and 4 games), a phenomenon termed the 'memory curse.' It attributes the effect primarily to eroding forward-looking intent (rather than paranoia) via lexical analysis of 378k reasoning traces, validates the mechanism with a LoRA adapter trained on forward-looking traces that mitigates decay and transfers zero-shot, shows via memory sanitization that content (not length) is the trigger, and finds that ablating explicit CoT often reduces the collapse.

Significance. If the central empirical pattern and mechanism hold, the work provides a useful cautionary finding for multi-agent LLM design, showing that longer recall can destabilize cooperation depending on elicited reasoning patterns. Strengths include the convergence of three distinct analyses (lexical, probe, sanitization), the zero-shot cross-game transfer of the fine-tuning intervention, and the content-vs-length control that isolates the memory trigger.

major comments (2)
  1. [Fine-tuning probe and validation] The fine-tuning probe section provides insufficient detail on trace labeling/selection criteria and training data composition for the forward-looking traces used to train the LoRA adapter. Without explicit criteria (e.g., how 'forward-looking intent' was operationalized in the 378k traces or whether curation relied on surface lexical features), it is unclear whether the adapter specifically counters intent erosion or instead captures correlated cooperative tendencies; this directly affects the load-bearing claim that degradation is mechanistically driven by eroding forward-looking intent rather than other content-induced shifts.
  2. [Memory sanitization analysis] The sanitization experiment holds prompt length fixed while replacing history with synthetic cooperative records and reports substantial restoration of cooperation. However, the generation process for these synthetic records (including any model used or filtering steps) is not fully specified, leaving open whether the control fully disentangles memory content effects from other factors such as recency bias or strategy updating.
minor comments (2)
  1. The abstract and results summary would benefit from explicit reporting of per-setting effect sizes, confidence intervals, or statistical tests supporting the '18 of 28' degradation count to allow readers to assess robustness.
  2. Notation for the four games and seven LLMs should be introduced with a table or clear definitions early in the methods to improve readability when discussing the 28 settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and reproducibility of our work. We have made revisions to address the concerns about methodological details in both the fine-tuning probe and memory sanitization sections.

read point-by-point responses
  1. Referee: [Fine-tuning probe and validation] The fine-tuning probe section provides insufficient detail on trace labeling/selection criteria and training data composition for the forward-looking traces used to train the LoRA adapter. Without explicit criteria (e.g., how 'forward-looking intent' was operationalized in the 378k traces or whether curation relied on surface lexical features), it is unclear whether the adapter specifically counters intent erosion or instead captures correlated cooperative tendencies; this directly affects the load-bearing claim that degradation is mechanistically driven by eroding forward-looking intent rather than other content-induced shifts.

    Authors: We agree that more explicit detail is warranted to support the mechanistic interpretation. We have revised Section 4.2 and added Appendix C to provide the operationalization: forward-looking intent was identified through lexical analysis using a curated set of indicators for planning and anticipation (detailed in the lexical section), applied to the 378k traces with a frequency threshold and spot-checked for accuracy. The training composition is the subset of traces from settings exhibiting the memory curse, ensuring relevance. Regarding whether it captures correlated tendencies, the zero-shot transfer to new games and the differential effect compared to other potential adapters (now reported) indicate specificity to intent erosion. This bolsters rather than undermines the claim. revision: yes

  2. Referee: [Memory sanitization analysis] The sanitization experiment holds prompt length fixed while replacing history with synthetic cooperative records and reports substantial restoration of cooperation. However, the generation process for these synthetic records (including any model used or filtering steps) is not fully specified, leaving open whether the control fully disentangles memory content effects from other factors such as recency bias or strategy updating.

    Authors: We thank the referee for this observation. We have updated the description of the sanitization experiment to specify that the synthetic records were produced by prompting a separate instance of the same LLM family with instructions to generate cooperative interaction histories for the given game, followed by automated filtering to exclude any non-cooperative language and manual inspection of a sample. To mitigate recency bias, the records were presented in randomized order within the fixed-length prompt. These additions clarify that the restoration is attributable to the cooperative content, consistent with our other analyses. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical claims

full rationale

The paper reports experimental results from multi-agent simulations across 7 LLMs and 4 games over 500 rounds, documenting cooperation degradation in 18 of 28 settings via direct measurement. Mechanism isolation relies on lexical analysis of 378k traces, a LoRA fine-tuning probe trained on forward-looking traces, memory sanitization controls that hold length fixed, and CoT ablation. None of these steps invoke equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations that reduce the central claim to its own inputs by construction. The term 'memory curse' is an observational label for the measured pattern, not a presupposed quantity, and interventions function as external tests rather than tautologies.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard game-theoretic assumptions about the chosen dilemmas and on the validity of lexical analysis of reasoning traces as a proxy for intent; no new entities or fitted constants are introduced.

axioms (2)
  • domain assumption The four games and 500-round horizon are representative of broader multi-agent social dilemmas
    Invoked to generalize from the 28 model-game settings to the memory-curse claim
  • domain assumption Lexical features in reasoning traces reliably indicate forward-looking intent
    Central to the first mechanistic analysis

pith-pipeline@v0.9.0 · 5537 in / 1218 out tokens · 30345 ms · 2026-05-11T02:19:44.796906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Attention is All you Need , url =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

  2. [2]

    Journal of economic theory , volume=

    Finite automata play the repeated prisoner's dilemma , author=. Journal of economic theory , volume=. 1986 , publisher=

  3. [3]

    Journal of Political Economy , volume=

    Intelligence, personality, and gains from cooperation in repeated interactions , author=. Journal of Political Economy , volume=. 2019 , publisher=

  4. [4]

    Nature , volume=

    Spontaneous giving and calculated greed , author=. Nature , volume=. 2012 , publisher=

  5. [5]

    Neuron , volume=

    The persistence and transience of memory , author=. Neuron , volume=. 2017 , publisher=

  6. [6]

    PLoS computational biology , volume=

    Neural correlates of sparse coding and dimensionality reduction , author=. PLoS computational biology , volume=. 2019 , publisher=

  7. [7]

    Frontiers in Neuroscience , year=

    Brain-inspired energy efficient technologies for next-generation artificial intelligence , author=. Frontiers in Neuroscience , year=

  8. [8]

    1988 , publisher=

    Sparse distributed memory , author=. 1988 , publisher=

  9. [9]

    MemGPT: Towards

    Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , journal=. MemGPT: Towards

  10. [10]

    Memory Networks

    Memory networks , author=. arXiv preprint arXiv:1410.3916 , year=

  11. [11]

    Advances in neural information processing systems , volume=

    Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

  12. [12]

    International Conference on Learning Representations , year=

    Prototype memory and attention mechanisms for few shot image generation , author=. International Conference on Learning Representations , year=

  13. [13]

    Nature Human Behaviour , volume=

    Playing repeated games with large language models , author=. Nature Human Behaviour , volume=. 2025 , publisher=

  14. [14]

    Gtbench: Uncovering the strategic reasoning capabilities of

    Duan, Jinhao and Zhang, Renming and Diffenderfer, James and Kailkhura, Bhavya and Sun, Lichao and Stengel-Eskin, Elias and Bansal, Mohit and Chen, Tianlong and Xu, Kaidi , journal=. Gtbench: Uncovering the strategic reasoning capabilities of

  15. [15]

    Cooperate or collapse: Emergence of sustainable cooperation in a society of

    Piatti, Giorgio and Jin, Zhijing and Kleiman-Weiner, Max and Sch. Cooperate or collapse: Emergence of sustainable cooperation in a society of. Advances in Neural Information Processing Systems , volume=

  16. [16]

    Membench: Towards more comprehensive evaluation on the memory of

    Tan, Haoran and Zhang, Zeyu and Ma, Chen and Chen, Xu and Dai, Quanyu and Dong, Zhenhua , booktitle=. Membench: Towards more comprehensive evaluation on the memory of

  17. [17]

    Evaluating memory in

    Hu, Yuanzhe and Wang, Yu and McAuley, Julian , journal=. Evaluating memory in

  18. [18]

    arXiv preprint arXiv:2506.23276 , year=

    Corrupted by reasoning: Reasoning language models become free-riders in public goods games , author=. arXiv preprint arXiv:2506.23276 , year=

  19. [19]

    Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

    Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

  20. [20]

    Science China Information Sciences , volume=

    The rise and potential of large language model based agents: A survey , author=. Science China Information Sciences , volume=. 2025 , publisher=

  21. [21]

    Econometrica , volume=

    The folk theorem in repeated games with discounting or with incomplete information , author=. Econometrica , volume=. 1986 , publisher=

  22. [22]

    1997 , publisher=

    Models of bounded rationality: Empirically grounded economic reason , author=. 1997 , publisher=

  23. [23]

    Playing games with

    Brookins, Philip and DeBacker, Jason Matthew , journal=. Playing games with

  24. [24]

    arXiv preprint arXiv:2304.11111 , year=

    Inducing anxiety in large language models increases exploration and bias , author=. arXiv preprint arXiv:2304.11111 , year=

  25. [25]

    doi:10.48550/arXiv.2303.13988

    Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=

  26. [26]

    Royal Society open science , volume=

    Limited memory optimizes cooperation in social dilemma experiments , author=. Royal Society open science , volume=. 2021 , publisher=

  27. [27]

    The American Economic Review , volume=

    Evolution and cooperation in noisy repeated games , author=. The American Economic Review , volume=. 1990 , publisher=

  28. [28]

    , author=

    Responses to depression and their effects on the duration of depressive episodes. , author=. Journal of abnormal psychology , volume=. 1991 , publisher=

  29. [29]

    2026 , note =

    Emanuel Tewolde and Xiao Zhang and David Guzman Piedrahita and Vincent Conitzer and Zhijing Jin , title =. 2026 , note =

  30. [30]

    Spontaneous Giving and Calculated Greed in Language Models

    Li, Yuxuan and Shirado, Hirokazu. Spontaneous Giving and Calculated Greed in Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.267

  31. [31]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  32. [32]

    Advances in neural information processing systems , volume=

    Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

  33. [33]

    Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective

    Kanishk Gandhi and Ayush K Chakravarthy and Anikait Singh and Nathan Lile and Noah Goodman , booktitle=. Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective. 2025 , url=