Recognition: no theorem link
Competition and Cooperation of LLM Agents in Games
Pith reviewed 2026-05-13 22:28 UTC · model grok-4.3
The pith
LLM agents cooperate in multi-round games rather than converging to Nash equilibria when fairness reasoning emerges in their chain-of-thought.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In network resource allocation and Cournot competition games, LLM agents supplied with multi-round prompts in non-zero-sum contexts cooperate rather than converge to Nash equilibria. Fairness reasoning identified in their chain-of-thought responses is the central driver. An analytical framework is proposed that models the evolution of LLM agent reasoning across successive rounds and explains the experimental results.
What carries the argument
Analytical framework that tracks the dynamics of LLM agent reasoning across successive interaction rounds.
If this is right
- LLM agents can sustain cooperative outcomes across repeated rounds in economic games even when competitive play would be individually rational.
- Fairness considerations in chain-of-thought can override incentives to exploit competitive equilibria.
- The proposed framework predicts how agent strategies evolve when reasoning is prompted over multiple rounds.
- Cooperative behavior may appear in other multi-agent LLM settings that share non-zero-sum structure and repeated interaction.
Where Pith is reading between the lines
- Prompt engineering focused on multi-round fairness cues could steer LLM systems toward cooperation in deployed multi-agent applications.
- Purely competitive simulations using LLMs may require explicit constraints to prevent unintended fairness-driven cooperation.
- The same reasoning dynamics could be tested in repeated versions of other canonical games such as the prisoner's dilemma.
Load-bearing premise
The observed cooperation arises primarily from fairness reasoning in chain-of-thought and will generalize beyond the tested prompt formats and specific game instances.
What would settle it
If the same LLM agents reach Nash equilibria in the resource allocation and Cournot games under single-round prompts or zero-sum conditions, or if fairness reasoning disappears from their chain-of-thought traces, the central claim would be falsified.
Figures
read the original abstract
Large language model (LLM) agents are increasingly deployed in competitive multi-agent settings, raising fundamental questions about whether they converge to equilibria and how their strategic behavior can be characterized. In this paper, we study LLM agent interactions in two standard games: a network resource allocation game and a Cournot competition game. Rather than converging to Nash equilibria, we find that LLM agents tend to cooperate when given multi-round prompts and non-zero-sum context. Chain-of-thought analysis reveals that fairness reasoning is central to this behavior. We propose an analytical framework that captures the dynamics of LLM agent reasoning across rounds and explains these experimental findings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines LLM agents in a network resource allocation game and a Cournot competition game. It reports that, rather than converging to Nash equilibria, the agents exhibit cooperative behavior under multi-round prompting in non-zero-sum settings. Chain-of-thought traces indicate that fairness considerations drive this cooperation, and the authors introduce an analytical framework to model the evolution of LLM reasoning across interaction rounds.
Significance. If the central claim holds after controls for game structure and prompt effects, the result would indicate that LLM agents can produce cooperative outcomes in repeated strategic interactions that standard game-theoretic predictions do not anticipate. This would be relevant for the design of multi-agent LLM systems and for understanding how chain-of-thought reasoning interacts with payoff structures.
major comments (3)
- [Experiments] The experimental section does not report a control condition using scripted rational agents (e.g., tit-for-tat or grim-trigger strategies) or prompts that explicitly remove fairness language. Without such a baseline it is impossible to determine whether the observed cooperation is attributable to LLM-specific reasoning or simply to the repeated non-zero-sum structure already known to support cooperation under the folk theorem.
- [Methods] No quantitative metrics, sample sizes, statistical tests, or prompt-variation ablations are provided in the abstract or described in the methods. This absence makes it impossible to assess the robustness of the claim that fairness reasoning is 'central' to the behavior.
- [Analytical Framework] The proposed analytical framework is introduced after the experimental observations and appears to be constructed to fit the reported trajectories. The manuscript does not state a priori predictions or falsifiable tests that would distinguish the framework from post-hoc rationalization.
minor comments (2)
- [Introduction] Standard references to repeated-game theory (e.g., folk theorem, Axelrod's work on tit-for-tat) are missing from the related-work discussion.
- [Analytical Framework] Notation for the analytical framework (e.g., state variables for reasoning updates) should be defined explicitly before use in the dynamics equations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped clarify the presentation and strengthen the empirical claims. We address each major point below and have revised the manuscript accordingly.
read point-by-point responses
-
Referee: [Experiments] The experimental section does not report a control condition using scripted rational agents (e.g., tit-for-tat or grim-trigger strategies) or prompts that explicitly remove fairness language. Without such a baseline it is impossible to determine whether the observed cooperation is attributable to LLM-specific reasoning or simply to the repeated non-zero-sum structure already known to support cooperation under the folk theorem.
Authors: We agree that control conditions are necessary to isolate LLM-specific reasoning from general repeated-game effects. In the revised manuscript we have added experiments with scripted tit-for-tat and grim-trigger agents in both games. These baselines confirm that cooperation can arise from repeated non-zero-sum structure alone, yet the LLM agents display distinct fairness-driven reasoning in their chain-of-thought traces that is absent from the scripted controls. We also include prompt ablations that remove fairness-related language, showing a statistically significant drop in cooperation rates. revision: yes
-
Referee: [Methods] No quantitative metrics, sample sizes, statistical tests, or prompt-variation ablations are provided in the abstract or described in the methods. This absence makes it impossible to assess the robustness of the claim that fairness reasoning is 'central' to the behavior.
Authors: We have expanded the Methods section to report quantitative metrics (cooperation frequency, payoff deviation from Nash, round-by-round reasoning state transitions), sample sizes (n = 50 independent runs per condition with standard errors), and statistical tests (paired t-tests and ANOVA against Nash benchmarks). Prompt-variation ablations are now included, systematically varying fairness cue strength and multi-round context; results show that fairness language is the dominant predictor of cooperative outcomes. revision: yes
-
Referee: [Analytical Framework] The proposed analytical framework is introduced after the experimental observations and appears to be constructed to fit the reported trajectories. The manuscript does not state a priori predictions or falsifiable tests that would distinguish the framework from post-hoc rationalization.
Authors: We acknowledge that the framework was originally presented after the experiments. The revised manuscript introduces the framework in Section 3 with explicit a priori predictions (e.g., expected transition probabilities between fairness and equilibrium reasoning states under varying payoff asymmetry). We now report falsifiable tests by generating new predictions for modified game parameters and validating them against held-out experimental runs, thereby reducing the appearance of post-hoc fitting. revision: partial
Circularity Check
No circularity: framework explains observations without reducing to inputs by construction
full rationale
The paper reports empirical results from LLM agents in two games showing cooperation under multi-round non-zero-sum prompts, with chain-of-thought revealing fairness reasoning, then introduces an analytical framework to capture reasoning dynamics. No equations, self-citations, or derivations are shown that define the framework in terms of the observed cooperation rates or that rename fitted behaviors as predictions. The central claim rests on experimental data rather than tautological reduction, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLM agents respond to game prompts in a manner that can be analyzed via chain-of-thought for strategic reasoning
- domain assumption The two chosen games are representative of broader competitive multi-agent settings
Reference graph
Works this paper leans on
-
[1]
Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
E. H. Kang, “Reasonably reasoning AI agents can avoid game- theoretic failures in zero-shot, provably,” 2026. [Online]. Available: https://arxiv.org/abs/2603.18563
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Understanding the planning of LLM agents: A survey
X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y . Wang, R. Tang, and E. Chen, “Understanding the planning of llm agents: A survey,”arXiv preprint arXiv:2402.02716, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[3]
Grid-Agent: An LLM-powered multi-agent system for power grid control,
Y . Zhang, A. M. Saber, A. Youssef, and D. Kundur, “Grid-agent: An llm-powered multi-agent system for power grid control,”arXiv preprint arXiv:2508.05702, 2025
-
[4]
Enhancing llms for power system simula- tions: A feedback-driven multi-agent framework,
M. Jia, Z. Cui, and G. Hug, “Enhancing llms for power system simula- tions: A feedback-driven multi-agent framework,”IEEE Transactions on Smart Grid, 2025
work page 2025
-
[5]
Smart-llm: Smart multi-agent robot task planning using large language models,
S. S. Kannan, V . L. Venkatesh, and B.-C. Min, “Smart-llm: Smart multi-agent robot task planning using large language models,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 12 140–12 147
work page 2024
-
[6]
Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim, and Marius Hobbhahn
S. Fish, Y . A. Gonczarowski, and R. I. Shorrer, “Algorithmic collusion by large language models,”arXiv preprint arXiv:2404.00806, vol. 7, no. 2, p. 5, 2024
-
[7]
The effect of state representation on llm agent behavior in dynamic routing games,
L. Goodyear, R. Guo, and R. Johari, “The effect of state representation on llm agent behavior in dynamic routing games,”arXiv preprint arXiv:2506.15624, 2025
-
[8]
arXiv preprint arXiv:2411.05990 , year=
W. Hua, O. Liu, L. Li, A. Amayuelas, J. Chen, L. Jiang, M. Jin, L. Fan, F. Sun, W. Wanget al., “Game-theoretic llm: Agent workflow for negotiation games,”arXiv preprint arXiv:2411.05990, 2024
-
[9]
Fairgame: a framework for ai agents bias recognition using game theory,
A. Buscemi, D. Proverbio, A. Di Stefano, T. A. Han, G. Castignani, and P. Li `o, “Fairgame: a framework for ai agents bias recognition using game theory,”arXiv preprint arXiv:2504.14325, 2025
-
[10]
Charging and rate control for elastic traffic,
F. Kelly, “Charging and rate control for elastic traffic,”European transactions on Telecommunications, vol. 8, no. 1, pp. 33–37, 1997
work page 1997
-
[11]
Do greedy autonomous systems make for a sensible internet?
B. Hajek and S. Gopalakrishnan, “Do greedy autonomous systems make for a sensible internet?” 2002, presented at the Conference on Stochastic Networks, Stanford University
work page 2002
-
[12]
A proportional share resource allocation algorithm for real-time, time-shared systems,
I. Stoica, H. Abdel-Wahab, K. Jeffay, S. K. Baruah, J. E. Gehrke, and C. G. Plaxton, “A proportional share resource allocation algorithm for real-time, time-shared systems,” in17th IEEE Real-Time Systems Symposium. IEEE, 1996, pp. 288–299
work page 1996
-
[13]
Efficiency loss in a network resource allocation game,
R. Johari and J. N. Tsitsiklis, “Efficiency loss in a network resource allocation game,”Mathematics of Operations Research, vol. 29, no. 3, pp. 407–435, 2004
work page 2004
-
[14]
D. S. Kirschen and G. Strbac,Fundamentals of power system eco- nomics. John Wiley & Sons, 2018
work page 2018
-
[15]
Competition and coalition formation of renewable power producers,
B. Zhang, R. Johari, and R. Rajagopal, “Competition and coalition formation of renewable power producers,”IEEE Transactions on Power Systems, vol. 30, no. 3, pp. 1624–1632, 2015
work page 2015
-
[16]
Cournot competition in networked markets,
K. Bimpikis, S. Ehsani, and R. Ilkılıc ¸, “Cournot competition in networked markets,”Management Science, vol. 65, no. 6, pp. 2467– 2481, 2019
work page 2019
-
[17]
Behavioral generative agents for power dispatch and auction,
S. Li, J. S. Kim, and C. Chen, “Behavioral generative agents for power dispatch and auction,”arXiv preprint arXiv:2603.08477, 2026
-
[18]
X. Lu, J. Qiu, Y . Yang, C. Zhang, J. Lin, and S. An, “Large language model-based bidding behavior agent and market sentiment agent- assisted electricity price prediction,”IEEE Transactions on Energy Markets, Policy and Regulation, vol. 3, no. 2, pp. 223–235, 2024
work page 2024
-
[19]
Behavioral generative agents for energy operations,
C. Chen, O. Karaduman, and X. Kuang, “Behavioral generative agents for energy operations,”arXiv preprint arXiv:2506.12664, 2025
-
[20]
Multi-agent reinforcement learning in cournot games,
Y . Shi and B. Zhang, “Multi-agent reinforcement learning in cournot games,” in2020 59th ieee conference on decision and control (cdc). IEEE, 2020, pp. 3561–3566
work page 2020
-
[21]
Efficiency loss in cournot games,
R. Johari and J. N. Tsitsiklis, “Efficiency loss in cournot games,” Harvard University, 2005
work page 2005
-
[22]
J. Friedman, “Oligopoly theory,”Handbook of mathematical eco- nomics, vol. 2, pp. 491–534, 1982
work page 1982
-
[23]
Google DeepMind, “Gemini 3.1 Pro model card,” https://deepmind. google/models/model-cards/gemini-3-1-pro/, Feb. 2026, accessed: 2026-03-31
work page 2026
-
[24]
arXiv preprint arXiv:2407.07086 , year=
L. Cross, V . Xiang, A. Bhatia, D. L. Yamins, and N. Haber, “Hypo- thetical minds: Scaffolding theory of mind for multi-agent tasks with large language models,”arXiv preprint arXiv:2407.07086, 2024
-
[25]
Fair end-to-end window-based congestion control,
J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,”IEEE/ACM Transactions on networking, vol. 8, no. 5, pp. 556–567, 2002
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.