arxiv: 2605.08060 · v1 · submitted 2026-05-08 · 💻 cs.CL · cs.AI· cs.GT· cs.MA

Recognition: 2 theorem links

· Lean Theorem

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Jiayuan Liu , Tianqin Li , Shiyi Du , Xin Luo , Haoxuan Zeng , Emanuel Tewolde , Tai Sing Lee , Tonghan Wang

show 2 more authors

Carl Kingsford Vincent Conitzer

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:19 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.GTcs.MA

keywords LLM agentsmulti-agent cooperationsocial dilemmascontext windowmemory effectschain-of-thoughtfine-tuning probes

0 comments

The pith

Expanding accessible history degrades cooperation in most LLM agent games by eroding forward-looking intent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that giving LLM agents longer access to past interactions reduces their willingness to cooperate in repeated social dilemma games. This occurs across most of the seven models and four games tested, even though longer context is usually viewed as an improvement. The authors link the drop to agents using less language about future outcomes in their internal reasoning. They show that swapping in synthetic cooperative histories or fine-tuning on forward-looking examples can reverse the effect, while full deliberation sometimes worsens it. The result reframes memory as an active influence on multi-agent behavior rather than a neutral upgrade.

Core claim

Across seven LLMs and four games run for hundreds of rounds, expanding the visible history of prior rounds lowers cooperation rates in eighteen of the twenty-eight model-game pairs. Lexical analysis of hundreds of thousands of reasoning traces ties the drop to reduced forward-looking intent rather than increased suspicion. A LoRA adapter trained only on forward-looking traces restores cooperation and transfers to new games. Replacing real history with synthetic cooperative records while holding length fixed also improves outcomes, and removing explicit chain-of-thought reasoning often lessens the collapse.

What carries the argument

The memory curse, in which longer visible interaction history shifts agent reasoning away from forward-looking intent and thereby reduces cooperation.

If this is right

Memory content, not length alone, controls whether cooperation rises or falls.
Fine-tuning on forward-looking reasoning traces can mitigate the curse and transfer across games.
Removing chain-of-thought deliberation can sometimes limit the damage from expanded history.
Sanitizing or replacing memory records restores cooperation without shortening the prompt.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent builders may need selective memory filters that preserve future-oriented traces while discarding others.
The pattern could appear in any long-running multi-agent deployment where agents retain full logs of past turns.
Testing whether summarization that emphasizes future payoffs avoids the curse would be a direct follow-up experiment.

Load-bearing premise

The drop in cooperation stems mainly from loss of forward-looking intent rather than from other changes such as growing distrust or simple length effects.

What would settle it

Run the same games with a model whose reasoning traces have been edited to remove all forward-looking language while keeping history length fixed; if cooperation stays low, the mechanism is supported.

Figures

Figures reproduced from arXiv: 2605.08060 by Carl Kingsford, Emanuel Tewolde, Haoxuan Zeng, Jiayuan Liu, Shiyi Du, Tai Sing Lee, Tianqin Li, Tonghan Wang, Vincent Conitzer, Xin Luo.

**Figure 2.** Figure 2: Cooperation rate across four social dilemmas as history length ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Memory Immune settings. We also observe that the certain models in certain games always show cooperation behavior with larger than 95% cooperation rate across different game-model settings. This plot shows the lexical analysis of the models’ reasoning traces reveals a clear cognitive basis for this divergence: immune models consistently sustain a significantly higher ratio of forward-looking reasoning … view at source ↗

**Figure 4.** Figure 4: Asymmetric memory evaluation across the Trust Game and Public Goods Game. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Evidence that the memory-content sensitivity is widespread but can be overridden, [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia. We validate this using targeted fine-tuning as a cognitive probe: a LoRA adapter trained exclusively on forward-looking traces mitigates the decay and transfers zero-shot to distinct games. Second, memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records, which restores cooperation substantially, proving the trigger is memory content, not length alone. Finally, ablating explicit Chain-of-Thought reasoning often reduces the collapse, showing that deliberation paradoxically amplifies the memory curse. Together, these results recast memory as an active determinant of multi-agent behavior: longer recall can either destabilize or support cooperation depending on the reasoning patterns it elicits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Expanded history reduces cooperation in many LLM agent games, with clean controls on length vs content, but the forward-looking intent mechanism rests on underspecified trace selection.

read the letter

The core observation is straightforward: across 7 models and 4 games, longer accessible history lowered cooperation rates in 18 of 28 settings. They label this the memory curse and test it over 500 rounds per run. That pattern is the main takeaway worth noting first. The sanitization control is the strongest piece of evidence. By holding prompt length fixed and swapping real history for synthetic cooperative records, they show the drop tracks memory content rather than token count alone. The CoT ablation is also useful; removing explicit reasoning sometimes softened the collapse, which suggests deliberation can amplify whatever the extra history triggers. These interventions are replicable and address the obvious length confound directly. The lexical scan of 378k traces and the LoRA probe are the parts that need more scrutiny. The claim is that the drop comes from eroding forward-looking intent, isolated by training an adapter only on forward-looking traces that then restores cooperation and transfers zero-shot. Without details on how those traces were chosen or labeled, it is hard to tell whether the adapter is correcting a specific intent shift or simply reinforcing cooperative output patterns in general. The stress-test concern about non-specific effects lands here. The paper is aimed at people running multi-agent LLM systems in negotiation or dilemma settings. Anyone scaling context windows for sustained agent interaction would want to see the empirical pattern and the content-vs-length split. It is worth sending to peer review. The basic result and the better controls are solid enough to justify referee time, even if the mechanistic story requires tighter validation on trace curation and alternative explanations.

Referee Report

2 major / 2 minor

Summary. The paper claims that expanding accessible history in LLM agents systematically degrades cooperation in multi-agent social dilemmas (observed in 18 of 28 model-game settings across 7 LLMs and 4 games), a phenomenon termed the 'memory curse.' It attributes the effect primarily to eroding forward-looking intent (rather than paranoia) via lexical analysis of 378k reasoning traces, validates the mechanism with a LoRA adapter trained on forward-looking traces that mitigates decay and transfers zero-shot, shows via memory sanitization that content (not length) is the trigger, and finds that ablating explicit CoT often reduces the collapse.

Significance. If the central empirical pattern and mechanism hold, the work provides a useful cautionary finding for multi-agent LLM design, showing that longer recall can destabilize cooperation depending on elicited reasoning patterns. Strengths include the convergence of three distinct analyses (lexical, probe, sanitization), the zero-shot cross-game transfer of the fine-tuning intervention, and the content-vs-length control that isolates the memory trigger.

major comments (2)

[Fine-tuning probe and validation] The fine-tuning probe section provides insufficient detail on trace labeling/selection criteria and training data composition for the forward-looking traces used to train the LoRA adapter. Without explicit criteria (e.g., how 'forward-looking intent' was operationalized in the 378k traces or whether curation relied on surface lexical features), it is unclear whether the adapter specifically counters intent erosion or instead captures correlated cooperative tendencies; this directly affects the load-bearing claim that degradation is mechanistically driven by eroding forward-looking intent rather than other content-induced shifts.
[Memory sanitization analysis] The sanitization experiment holds prompt length fixed while replacing history with synthetic cooperative records and reports substantial restoration of cooperation. However, the generation process for these synthetic records (including any model used or filtering steps) is not fully specified, leaving open whether the control fully disentangles memory content effects from other factors such as recency bias or strategy updating.

minor comments (2)

The abstract and results summary would benefit from explicit reporting of per-setting effect sizes, confidence intervals, or statistical tests supporting the '18 of 28' degradation count to allow readers to assess robustness.
Notation for the four games and seven LLMs should be introduced with a table or clear definitions early in the methods to improve readability when discussing the 28 settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and reproducibility of our work. We have made revisions to address the concerns about methodological details in both the fine-tuning probe and memory sanitization sections.

read point-by-point responses

Referee: [Fine-tuning probe and validation] The fine-tuning probe section provides insufficient detail on trace labeling/selection criteria and training data composition for the forward-looking traces used to train the LoRA adapter. Without explicit criteria (e.g., how 'forward-looking intent' was operationalized in the 378k traces or whether curation relied on surface lexical features), it is unclear whether the adapter specifically counters intent erosion or instead captures correlated cooperative tendencies; this directly affects the load-bearing claim that degradation is mechanistically driven by eroding forward-looking intent rather than other content-induced shifts.

Authors: We agree that more explicit detail is warranted to support the mechanistic interpretation. We have revised Section 4.2 and added Appendix C to provide the operationalization: forward-looking intent was identified through lexical analysis using a curated set of indicators for planning and anticipation (detailed in the lexical section), applied to the 378k traces with a frequency threshold and spot-checked for accuracy. The training composition is the subset of traces from settings exhibiting the memory curse, ensuring relevance. Regarding whether it captures correlated tendencies, the zero-shot transfer to new games and the differential effect compared to other potential adapters (now reported) indicate specificity to intent erosion. This bolsters rather than undermines the claim. revision: yes
Referee: [Memory sanitization analysis] The sanitization experiment holds prompt length fixed while replacing history with synthetic cooperative records and reports substantial restoration of cooperation. However, the generation process for these synthetic records (including any model used or filtering steps) is not fully specified, leaving open whether the control fully disentangles memory content effects from other factors such as recency bias or strategy updating.

Authors: We thank the referee for this observation. We have updated the description of the sanitization experiment to specify that the synthetic records were produced by prompting a separate instance of the same LLM family with instructions to generate cooperative interaction histories for the given game, followed by automated filtering to exclude any non-cooperative language and manual inspection of a sample. To mitigate recency bias, the records were presented in randomized order within the fixed-length prompt. These additions clarify that the restoration is attributable to the cooperative content, consistent with our other analyses. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical claims

full rationale

The paper reports experimental results from multi-agent simulations across 7 LLMs and 4 games over 500 rounds, documenting cooperation degradation in 18 of 28 settings via direct measurement. Mechanism isolation relies on lexical analysis of 378k traces, a LoRA fine-tuning probe trained on forward-looking traces, memory sanitization controls that hold length fixed, and CoT ablation. None of these steps invoke equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations that reduce the central claim to its own inputs by construction. The term 'memory curse' is an observational label for the measured pattern, not a presupposed quantity, and interventions function as external tests rather than tautologies.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard game-theoretic assumptions about the chosen dilemmas and on the validity of lexical analysis of reasoning traces as a proxy for intent; no new entities or fitted constants are introduced.

axioms (2)

domain assumption The four games and 500-round horizon are representative of broader multi-agent social dilemmas
Invoked to generalize from the 28 model-game settings to the memory-curse claim
domain assumption Lexical features in reasoning traces reliably indicate forward-looking intent
Central to the first mechanistic analysis

pith-pipeline@v0.9.0 · 5537 in / 1218 out tokens · 30345 ms · 2026-05-11T02:19:44.796906+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt (LogicNat orbit embedding) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

expanding accessible history degrades cooperation in 18 of 28 model–game settings, a pattern we term the memory curse... lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

work page
[2]

Journal of economic theory , volume=

Finite automata play the repeated prisoner's dilemma , author=. Journal of economic theory , volume=. 1986 , publisher=

work page 1986
[3]

Journal of Political Economy , volume=

Intelligence, personality, and gains from cooperation in repeated interactions , author=. Journal of Political Economy , volume=. 2019 , publisher=

work page 2019
[4]

Nature , volume=

Spontaneous giving and calculated greed , author=. Nature , volume=. 2012 , publisher=

work page 2012
[5]

Neuron , volume=

The persistence and transience of memory , author=. Neuron , volume=. 2017 , publisher=

work page 2017
[6]

PLoS computational biology , volume=

Neural correlates of sparse coding and dimensionality reduction , author=. PLoS computational biology , volume=. 2019 , publisher=

work page 2019
[7]

Frontiers in Neuroscience , year=

Brain-inspired energy efficient technologies for next-generation artificial intelligence , author=. Frontiers in Neuroscience , year=

work page
[8]

1988 , publisher=

Sparse distributed memory , author=. 1988 , publisher=

work page 1988
[9]

MemGPT: Towards

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , journal=. MemGPT: Towards

work page
[10]

Memory Networks

Memory networks , author=. arXiv preprint arXiv:1410.3916 , year=

work page Pith review arXiv
[11]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page
[12]

International Conference on Learning Representations , year=

Prototype memory and attention mechanisms for few shot image generation , author=. International Conference on Learning Representations , year=

work page
[13]

Nature Human Behaviour , volume=

Playing repeated games with large language models , author=. Nature Human Behaviour , volume=. 2025 , publisher=

work page 2025
[14]

Gtbench: Uncovering the strategic reasoning capabilities of

Duan, Jinhao and Zhang, Renming and Diffenderfer, James and Kailkhura, Bhavya and Sun, Lichao and Stengel-Eskin, Elias and Bansal, Mohit and Chen, Tianlong and Xu, Kaidi , journal=. Gtbench: Uncovering the strategic reasoning capabilities of

work page
[15]

Cooperate or collapse: Emergence of sustainable cooperation in a society of

Piatti, Giorgio and Jin, Zhijing and Kleiman-Weiner, Max and Sch. Cooperate or collapse: Emergence of sustainable cooperation in a society of. Advances in Neural Information Processing Systems , volume=

work page
[16]

Membench: Towards more comprehensive evaluation on the memory of

Tan, Haoran and Zhang, Zeyu and Ma, Chen and Chen, Xu and Dai, Quanyu and Dong, Zhenhua , booktitle=. Membench: Towards more comprehensive evaluation on the memory of

work page
[17]

Evaluating memory in

Hu, Yuanzhe and Wang, Yu and McAuley, Julian , journal=. Evaluating memory in

work page
[18]

arXiv preprint arXiv:2506.23276 , year=

Corrupted by reasoning: Reasoning language models become free-riders in public goods games , author=. arXiv preprint arXiv:2506.23276 , year=

work page arXiv
[19]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

work page
[20]

Science China Information Sciences , volume=

The rise and potential of large language model based agents: A survey , author=. Science China Information Sciences , volume=. 2025 , publisher=

work page 2025
[21]

Econometrica , volume=

The folk theorem in repeated games with discounting or with incomplete information , author=. Econometrica , volume=. 1986 , publisher=

work page 1986
[22]

1997 , publisher=

Models of bounded rationality: Empirically grounded economic reason , author=. 1997 , publisher=

work page 1997
[23]

Playing games with

Brookins, Philip and DeBacker, Jason Matthew , journal=. Playing games with

work page
[24]

arXiv preprint arXiv:2304.11111 , year=

Inducing anxiety in large language models increases exploration and bias , author=. arXiv preprint arXiv:2304.11111 , year=

work page arXiv
[25]

doi:10.48550/arXiv.2303.13988

Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=

work page arXiv
[26]

Royal Society open science , volume=

Limited memory optimizes cooperation in social dilemma experiments , author=. Royal Society open science , volume=. 2021 , publisher=

work page 2021
[27]

The American Economic Review , volume=

Evolution and cooperation in noisy repeated games , author=. The American Economic Review , volume=. 1990 , publisher=

work page 1990
[28]

, author=

Responses to depression and their effects on the duration of depressive episodes. , author=. Journal of abnormal psychology , volume=. 1991 , publisher=

work page 1991
[29]

2026 , note =

Emanuel Tewolde and Xiao Zhang and David Guzman Piedrahita and Vincent Conitzer and Zhijing Jin , title =. 2026 , note =

work page 2026
[30]

Spontaneous Giving and Calculated Greed in Language Models

Li, Yuxuan and Shirado, Hirokazu. Spontaneous Giving and Calculated Greed in Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.267

work page doi:10.18653/v1/2025.emnlp-main.267 2025
[31]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

work page
[32]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

work page
[33]

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective

Kanishk Gandhi and Ayush K Chakravarthy and Anikait Singh and Nathan Lile and Noah Goodman , booktitle=. Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective. 2025 , url=

work page 2025