Recognition: 2 theorem links
· Lean TheoremThe Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
Pith reviewed 2026-05-11 02:19 UTC · model grok-4.3
The pith
Expanding accessible history degrades cooperation in most LLM agent games by eroding forward-looking intent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across seven LLMs and four games run for hundreds of rounds, expanding the visible history of prior rounds lowers cooperation rates in eighteen of the twenty-eight model-game pairs. Lexical analysis of hundreds of thousands of reasoning traces ties the drop to reduced forward-looking intent rather than increased suspicion. A LoRA adapter trained only on forward-looking traces restores cooperation and transfers to new games. Replacing real history with synthetic cooperative records while holding length fixed also improves outcomes, and removing explicit chain-of-thought reasoning often lessens the collapse.
What carries the argument
The memory curse, in which longer visible interaction history shifts agent reasoning away from forward-looking intent and thereby reduces cooperation.
If this is right
- Memory content, not length alone, controls whether cooperation rises or falls.
- Fine-tuning on forward-looking reasoning traces can mitigate the curse and transfer across games.
- Removing chain-of-thought deliberation can sometimes limit the damage from expanded history.
- Sanitizing or replacing memory records restores cooperation without shortening the prompt.
Where Pith is reading between the lines
- Agent builders may need selective memory filters that preserve future-oriented traces while discarding others.
- The pattern could appear in any long-running multi-agent deployment where agents retain full logs of past turns.
- Testing whether summarization that emphasizes future payoffs avoids the curse would be a direct follow-up experiment.
Load-bearing premise
The drop in cooperation stems mainly from loss of forward-looking intent rather than from other changes such as growing distrust or simple length effects.
What would settle it
Run the same games with a model whose reasoning traces have been edited to remove all forward-looking language while keeping history length fixed; if cooperation stays low, the mechanism is supported.
Figures
read the original abstract
Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia. We validate this using targeted fine-tuning as a cognitive probe: a LoRA adapter trained exclusively on forward-looking traces mitigates the decay and transfers zero-shot to distinct games. Second, memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records, which restores cooperation substantially, proving the trigger is memory content, not length alone. Finally, ablating explicit Chain-of-Thought reasoning often reduces the collapse, showing that deliberation paradoxically amplifies the memory curse. Together, these results recast memory as an active determinant of multi-agent behavior: longer recall can either destabilize or support cooperation depending on the reasoning patterns it elicits.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that expanding accessible history in LLM agents systematically degrades cooperation in multi-agent social dilemmas (observed in 18 of 28 model-game settings across 7 LLMs and 4 games), a phenomenon termed the 'memory curse.' It attributes the effect primarily to eroding forward-looking intent (rather than paranoia) via lexical analysis of 378k reasoning traces, validates the mechanism with a LoRA adapter trained on forward-looking traces that mitigates decay and transfers zero-shot, shows via memory sanitization that content (not length) is the trigger, and finds that ablating explicit CoT often reduces the collapse.
Significance. If the central empirical pattern and mechanism hold, the work provides a useful cautionary finding for multi-agent LLM design, showing that longer recall can destabilize cooperation depending on elicited reasoning patterns. Strengths include the convergence of three distinct analyses (lexical, probe, sanitization), the zero-shot cross-game transfer of the fine-tuning intervention, and the content-vs-length control that isolates the memory trigger.
major comments (2)
- [Fine-tuning probe and validation] The fine-tuning probe section provides insufficient detail on trace labeling/selection criteria and training data composition for the forward-looking traces used to train the LoRA adapter. Without explicit criteria (e.g., how 'forward-looking intent' was operationalized in the 378k traces or whether curation relied on surface lexical features), it is unclear whether the adapter specifically counters intent erosion or instead captures correlated cooperative tendencies; this directly affects the load-bearing claim that degradation is mechanistically driven by eroding forward-looking intent rather than other content-induced shifts.
- [Memory sanitization analysis] The sanitization experiment holds prompt length fixed while replacing history with synthetic cooperative records and reports substantial restoration of cooperation. However, the generation process for these synthetic records (including any model used or filtering steps) is not fully specified, leaving open whether the control fully disentangles memory content effects from other factors such as recency bias or strategy updating.
minor comments (2)
- The abstract and results summary would benefit from explicit reporting of per-setting effect sizes, confidence intervals, or statistical tests supporting the '18 of 28' degradation count to allow readers to assess robustness.
- Notation for the four games and seven LLMs should be introduced with a table or clear definitions early in the methods to improve readability when discussing the 28 settings.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which have helped us improve the clarity and reproducibility of our work. We have made revisions to address the concerns about methodological details in both the fine-tuning probe and memory sanitization sections.
read point-by-point responses
-
Referee: [Fine-tuning probe and validation] The fine-tuning probe section provides insufficient detail on trace labeling/selection criteria and training data composition for the forward-looking traces used to train the LoRA adapter. Without explicit criteria (e.g., how 'forward-looking intent' was operationalized in the 378k traces or whether curation relied on surface lexical features), it is unclear whether the adapter specifically counters intent erosion or instead captures correlated cooperative tendencies; this directly affects the load-bearing claim that degradation is mechanistically driven by eroding forward-looking intent rather than other content-induced shifts.
Authors: We agree that more explicit detail is warranted to support the mechanistic interpretation. We have revised Section 4.2 and added Appendix C to provide the operationalization: forward-looking intent was identified through lexical analysis using a curated set of indicators for planning and anticipation (detailed in the lexical section), applied to the 378k traces with a frequency threshold and spot-checked for accuracy. The training composition is the subset of traces from settings exhibiting the memory curse, ensuring relevance. Regarding whether it captures correlated tendencies, the zero-shot transfer to new games and the differential effect compared to other potential adapters (now reported) indicate specificity to intent erosion. This bolsters rather than undermines the claim. revision: yes
-
Referee: [Memory sanitization analysis] The sanitization experiment holds prompt length fixed while replacing history with synthetic cooperative records and reports substantial restoration of cooperation. However, the generation process for these synthetic records (including any model used or filtering steps) is not fully specified, leaving open whether the control fully disentangles memory content effects from other factors such as recency bias or strategy updating.
Authors: We thank the referee for this observation. We have updated the description of the sanitization experiment to specify that the synthetic records were produced by prompting a separate instance of the same LLM family with instructions to generate cooperative interaction histories for the given game, followed by automated filtering to exclude any non-cooperative language and manual inspection of a sample. To mitigate recency bias, the records were presented in randomized order within the fixed-length prompt. These additions clarify that the restoration is attributable to the cooperative content, consistent with our other analyses. revision: yes
Circularity Check
No significant circularity in empirical claims
full rationale
The paper reports experimental results from multi-agent simulations across 7 LLMs and 4 games over 500 rounds, documenting cooperation degradation in 18 of 28 settings via direct measurement. Mechanism isolation relies on lexical analysis of 378k traces, a LoRA fine-tuning probe trained on forward-looking traces, memory sanitization controls that hold length fixed, and CoT ablation. None of these steps invoke equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations that reduce the central claim to its own inputs by construction. The term 'memory curse' is an observational label for the measured pattern, not a presupposed quantity, and interventions function as external tests rather than tautologies.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The four games and 500-round horizon are representative of broader multi-agent social dilemmas
- domain assumption Lexical features in reasoning traces reliably indicate forward-looking intent
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt (LogicNat orbit embedding) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
expanding accessible history degrades cooperation in 18 of 28 model–game settings, a pattern we term the memory curse... lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Attention is All you Need , url =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
-
[2]
Journal of economic theory , volume=
Finite automata play the repeated prisoner's dilemma , author=. Journal of economic theory , volume=. 1986 , publisher=
work page 1986
-
[3]
Journal of Political Economy , volume=
Intelligence, personality, and gains from cooperation in repeated interactions , author=. Journal of Political Economy , volume=. 2019 , publisher=
work page 2019
-
[4]
Spontaneous giving and calculated greed , author=. Nature , volume=. 2012 , publisher=
work page 2012
-
[5]
The persistence and transience of memory , author=. Neuron , volume=. 2017 , publisher=
work page 2017
-
[6]
PLoS computational biology , volume=
Neural correlates of sparse coding and dimensionality reduction , author=. PLoS computational biology , volume=. 2019 , publisher=
work page 2019
-
[7]
Frontiers in Neuroscience , year=
Brain-inspired energy efficient technologies for next-generation artificial intelligence , author=. Frontiers in Neuroscience , year=
- [8]
-
[9]
Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , journal=. MemGPT: Towards
-
[10]
Memory networks , author=. arXiv preprint arXiv:1410.3916 , year=
-
[11]
Advances in neural information processing systems , volume=
Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=
-
[12]
International Conference on Learning Representations , year=
Prototype memory and attention mechanisms for few shot image generation , author=. International Conference on Learning Representations , year=
-
[13]
Nature Human Behaviour , volume=
Playing repeated games with large language models , author=. Nature Human Behaviour , volume=. 2025 , publisher=
work page 2025
-
[14]
Gtbench: Uncovering the strategic reasoning capabilities of
Duan, Jinhao and Zhang, Renming and Diffenderfer, James and Kailkhura, Bhavya and Sun, Lichao and Stengel-Eskin, Elias and Bansal, Mohit and Chen, Tianlong and Xu, Kaidi , journal=. Gtbench: Uncovering the strategic reasoning capabilities of
-
[15]
Cooperate or collapse: Emergence of sustainable cooperation in a society of
Piatti, Giorgio and Jin, Zhijing and Kleiman-Weiner, Max and Sch. Cooperate or collapse: Emergence of sustainable cooperation in a society of. Advances in Neural Information Processing Systems , volume=
-
[16]
Membench: Towards more comprehensive evaluation on the memory of
Tan, Haoran and Zhang, Zeyu and Ma, Chen and Chen, Xu and Dai, Quanyu and Dong, Zhenhua , booktitle=. Membench: Towards more comprehensive evaluation on the memory of
-
[17]
Hu, Yuanzhe and Wang, Yu and McAuley, Julian , journal=. Evaluating memory in
-
[18]
arXiv preprint arXiv:2506.23276 , year=
Corrupted by reasoning: Reasoning language models become free-riders in public goods games , author=. arXiv preprint arXiv:2506.23276 , year=
-
[19]
Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
-
[20]
Science China Information Sciences , volume=
The rise and potential of large language model based agents: A survey , author=. Science China Information Sciences , volume=. 2025 , publisher=
work page 2025
-
[21]
The folk theorem in repeated games with discounting or with incomplete information , author=. Econometrica , volume=. 1986 , publisher=
work page 1986
-
[22]
Models of bounded rationality: Empirically grounded economic reason , author=. 1997 , publisher=
work page 1997
-
[23]
Brookins, Philip and DeBacker, Jason Matthew , journal=. Playing games with
-
[24]
arXiv preprint arXiv:2304.11111 , year=
Inducing anxiety in large language models increases exploration and bias , author=. arXiv preprint arXiv:2304.11111 , year=
-
[25]
Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=
-
[26]
Royal Society open science , volume=
Limited memory optimizes cooperation in social dilemma experiments , author=. Royal Society open science , volume=. 2021 , publisher=
work page 2021
-
[27]
The American Economic Review , volume=
Evolution and cooperation in noisy repeated games , author=. The American Economic Review , volume=. 1990 , publisher=
work page 1990
- [28]
-
[29]
Emanuel Tewolde and Xiao Zhang and David Guzman Piedrahita and Vincent Conitzer and Zhijing Jin , title =. 2026 , note =
work page 2026
-
[30]
Spontaneous Giving and Calculated Greed in Language Models
Li, Yuxuan and Shirado, Hirokazu. Spontaneous Giving and Calculated Greed in Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.267
-
[31]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[32]
Advances in neural information processing systems , volume=
Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
-
[33]
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective
Kanishk Gandhi and Ayush K Chakravarthy and Anikait Singh and Nathan Lile and Noah Goodman , booktitle=. Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective. 2025 , url=
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.