arxiv: 2604.00487 · v2 · submitted 2026-04-01 · 💻 cs.MA · cs.GT· cs.SY· eess.SY

Recognition: no theorem link

Competition and Cooperation of LLM Agents in Games

Jiayi Yao , Cong Chen , Baosen Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:28 UTC · model grok-4.3

classification 💻 cs.MA cs.GTcs.SYeess.SY

keywords LLM agentsmulti-agent gamescooperationNash equilibriumchain-of-thoughtfairness reasoningCournot competitionresource allocation

0 comments

The pith

LLM agents cooperate in multi-round games rather than converging to Nash equilibria when fairness reasoning emerges in their chain-of-thought.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how large language model agents interact in two standard competitive games: network resource allocation and Cournot competition. It reports that these agents cooperate instead of reaching Nash equilibria when given multi-round prompts in non-zero-sum settings. Chain-of-thought traces show that fairness considerations drive the shift away from pure competition. The authors introduce an analytical framework to track how LLM reasoning changes across interaction rounds and to account for the observed patterns.

Core claim

In network resource allocation and Cournot competition games, LLM agents supplied with multi-round prompts in non-zero-sum contexts cooperate rather than converge to Nash equilibria. Fairness reasoning identified in their chain-of-thought responses is the central driver. An analytical framework is proposed that models the evolution of LLM agent reasoning across successive rounds and explains the experimental results.

What carries the argument

Analytical framework that tracks the dynamics of LLM agent reasoning across successive interaction rounds.

If this is right

LLM agents can sustain cooperative outcomes across repeated rounds in economic games even when competitive play would be individually rational.
Fairness considerations in chain-of-thought can override incentives to exploit competitive equilibria.
The proposed framework predicts how agent strategies evolve when reasoning is prompted over multiple rounds.
Cooperative behavior may appear in other multi-agent LLM settings that share non-zero-sum structure and repeated interaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Prompt engineering focused on multi-round fairness cues could steer LLM systems toward cooperation in deployed multi-agent applications.
Purely competitive simulations using LLMs may require explicit constraints to prevent unintended fairness-driven cooperation.
The same reasoning dynamics could be tested in repeated versions of other canonical games such as the prisoner's dilemma.

Load-bearing premise

The observed cooperation arises primarily from fairness reasoning in chain-of-thought and will generalize beyond the tested prompt formats and specific game instances.

What would settle it

If the same LLM agents reach Nash equilibria in the resource allocation and Cournot games under single-round prompts or zero-sum conditions, or if fairness reasoning disappears from their chain-of-thought traces, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.00487 by Baosen Zhang, Cong Chen, Jiayi Yao.

**Figure 3.** Figure 3: A controlled perturbation test demonstrating retaliation and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 2.** Figure 2: Dynamic evolution of θ. It gradually builds up through a process of mutual concession. Agent 1 (Initiator) signals cooperation, and Agent 2 (Reciprocator) responds, leading the system to the social optimum (θ = 1.0). • Round 1 (Nash Initialization): Both agents bid x = 5.0, and the initialize as pure rational maximizers. • Round 2 (Initiation of Trust): Agent 1 voluntarily yields market share, dropping to … view at source ↗

**Figure 4.** Figure 4: Dynamic evolution of endogenous social parameters extracted from [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Large language model (LLM) agents are increasingly deployed in competitive multi-agent settings, raising fundamental questions about whether they converge to equilibria and how their strategic behavior can be characterized. In this paper, we study LLM agent interactions in two standard games: a network resource allocation game and a Cournot competition game. Rather than converging to Nash equilibria, we find that LLM agents tend to cooperate when given multi-round prompts and non-zero-sum context. Chain-of-thought analysis reveals that fairness reasoning is central to this behavior. We propose an analytical framework that captures the dynamics of LLM agent reasoning across rounds and explains these experimental findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM agents cooperate in these two games via fairness in multi-round prompts, but the setup does not separate that from standard repeated-game effects.

read the letter

The core observation is that in the network resource allocation game and the Cournot game, LLM agents reach cooperative outcomes instead of Nash equilibria once the prompt allows multiple rounds and makes the non-zero-sum nature explicit. Chain-of-thought traces show fairness language appearing, and the authors sketch a framework to track how reasoning evolves across rounds. That is the main new empirical note: a concrete demonstration of this pattern in two standard settings with current LLM agents. The work applies existing agent techniques cleanly and documents the deviation from equilibrium play in a way that matches some folk-theorem intuitions but ties it to the models' step-by-step output. Credit for running the experiments on these particular games and surfacing the fairness pattern in the traces. The soft spot is the missing controls. Nothing in the abstract or description shows a baseline with non-LLM rational agents, with fairness language removed from the prompt, or with single-round conditions that would isolate whether the cooperation is driven by the LLM's internal reasoning or simply by the repeated non-zero-sum structure itself. Sample sizes, prompt variations, and statistical checks are also not reported at a level that lets a reader judge robustness. The analytical framework appears fitted to the observed behaviors rather than derived independently. This paper is for groups working on multi-agent LLM deployments in markets or shared resources who want concrete examples of strategic deviation. A reader already familiar with repeated-game theory will see the gap quickly, but the topic is timely enough that a serious referee should look at the full runs and ask for the controls. I would send it to review with those requests.

Referee Report

3 major / 2 minor

Summary. The paper examines LLM agents in a network resource allocation game and a Cournot competition game. It reports that, rather than converging to Nash equilibria, the agents exhibit cooperative behavior under multi-round prompting in non-zero-sum settings. Chain-of-thought traces indicate that fairness considerations drive this cooperation, and the authors introduce an analytical framework to model the evolution of LLM reasoning across interaction rounds.

Significance. If the central claim holds after controls for game structure and prompt effects, the result would indicate that LLM agents can produce cooperative outcomes in repeated strategic interactions that standard game-theoretic predictions do not anticipate. This would be relevant for the design of multi-agent LLM systems and for understanding how chain-of-thought reasoning interacts with payoff structures.

major comments (3)

[Experiments] The experimental section does not report a control condition using scripted rational agents (e.g., tit-for-tat or grim-trigger strategies) or prompts that explicitly remove fairness language. Without such a baseline it is impossible to determine whether the observed cooperation is attributable to LLM-specific reasoning or simply to the repeated non-zero-sum structure already known to support cooperation under the folk theorem.
[Methods] No quantitative metrics, sample sizes, statistical tests, or prompt-variation ablations are provided in the abstract or described in the methods. This absence makes it impossible to assess the robustness of the claim that fairness reasoning is 'central' to the behavior.
[Analytical Framework] The proposed analytical framework is introduced after the experimental observations and appears to be constructed to fit the reported trajectories. The manuscript does not state a priori predictions or falsifiable tests that would distinguish the framework from post-hoc rationalization.

minor comments (2)

[Introduction] Standard references to repeated-game theory (e.g., folk theorem, Axelrod's work on tit-for-tat) are missing from the related-work discussion.
[Analytical Framework] Notation for the analytical framework (e.g., state variables for reasoning updates) should be defined explicitly before use in the dynamics equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped clarify the presentation and strengthen the empirical claims. We address each major point below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [Experiments] The experimental section does not report a control condition using scripted rational agents (e.g., tit-for-tat or grim-trigger strategies) or prompts that explicitly remove fairness language. Without such a baseline it is impossible to determine whether the observed cooperation is attributable to LLM-specific reasoning or simply to the repeated non-zero-sum structure already known to support cooperation under the folk theorem.

Authors: We agree that control conditions are necessary to isolate LLM-specific reasoning from general repeated-game effects. In the revised manuscript we have added experiments with scripted tit-for-tat and grim-trigger agents in both games. These baselines confirm that cooperation can arise from repeated non-zero-sum structure alone, yet the LLM agents display distinct fairness-driven reasoning in their chain-of-thought traces that is absent from the scripted controls. We also include prompt ablations that remove fairness-related language, showing a statistically significant drop in cooperation rates. revision: yes
Referee: [Methods] No quantitative metrics, sample sizes, statistical tests, or prompt-variation ablations are provided in the abstract or described in the methods. This absence makes it impossible to assess the robustness of the claim that fairness reasoning is 'central' to the behavior.

Authors: We have expanded the Methods section to report quantitative metrics (cooperation frequency, payoff deviation from Nash, round-by-round reasoning state transitions), sample sizes (n = 50 independent runs per condition with standard errors), and statistical tests (paired t-tests and ANOVA against Nash benchmarks). Prompt-variation ablations are now included, systematically varying fairness cue strength and multi-round context; results show that fairness language is the dominant predictor of cooperative outcomes. revision: yes
Referee: [Analytical Framework] The proposed analytical framework is introduced after the experimental observations and appears to be constructed to fit the reported trajectories. The manuscript does not state a priori predictions or falsifiable tests that would distinguish the framework from post-hoc rationalization.

Authors: We acknowledge that the framework was originally presented after the experiments. The revised manuscript introduces the framework in Section 3 with explicit a priori predictions (e.g., expected transition probabilities between fairness and equilibrium reasoning states under varying payoff asymmetry). We now report falsifiable tests by generating new predictions for modified game parameters and validating them against held-out experimental runs, thereby reducing the appearance of post-hoc fitting. revision: partial

Circularity Check

0 steps flagged

No circularity: framework explains observations without reducing to inputs by construction

full rationale

The paper reports empirical results from LLM agents in two games showing cooperation under multi-round non-zero-sum prompts, with chain-of-thought revealing fairness reasoning, then introduces an analytical framework to capture reasoning dynamics. No equations, self-citations, or derivations are shown that define the framework in terms of the observed cooperation rates or that rename fitted behaviors as predictions. The central claim rests on experimental data rather than tautological reduction, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from game theory and LLM prompting; no free parameters or invented entities are visible in the abstract.

axioms (2)

domain assumption LLM agents respond to game prompts in a manner that can be analyzed via chain-of-thought for strategic reasoning
Invoked to interpret cooperation as fairness-driven.
domain assumption The two chosen games are representative of broader competitive multi-agent settings
Used to generalize findings beyond the specific instances.

pith-pipeline@v0.9.0 · 5400 in / 1113 out tokens · 34651 ms · 2026-05-13T22:28:36.877425+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably

E. H. Kang, “Reasonably reasoning AI agents can avoid game- theoretic failures in zero-shot, provably,” 2026. [Online]. Available: https://arxiv.org/abs/2603.18563

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Understanding the planning of LLM agents: A survey

X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y . Wang, R. Tang, and E. Chen, “Understanding the planning of llm agents: A survey,”arXiv preprint arXiv:2402.02716, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Grid-Agent: An LLM-powered multi-agent system for power grid control,

Y . Zhang, A. M. Saber, A. Youssef, and D. Kundur, “Grid-agent: An llm-powered multi-agent system for power grid control,”arXiv preprint arXiv:2508.05702, 2025

work page arXiv 2025
[4]

Enhancing llms for power system simula- tions: A feedback-driven multi-agent framework,

M. Jia, Z. Cui, and G. Hug, “Enhancing llms for power system simula- tions: A feedback-driven multi-agent framework,”IEEE Transactions on Smart Grid, 2025

work page 2025
[5]

Smart-llm: Smart multi-agent robot task planning using large language models,

S. S. Kannan, V . L. Venkatesh, and B.-C. Min, “Smart-llm: Smart multi-agent robot task planning using large language models,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 12 140–12 147

work page 2024
[6]

Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim, and Marius Hobbhahn

S. Fish, Y . A. Gonczarowski, and R. I. Shorrer, “Algorithmic collusion by large language models,”arXiv preprint arXiv:2404.00806, vol. 7, no. 2, p. 5, 2024

work page arXiv 2024
[7]

The effect of state representation on llm agent behavior in dynamic routing games,

L. Goodyear, R. Guo, and R. Johari, “The effect of state representation on llm agent behavior in dynamic routing games,”arXiv preprint arXiv:2506.15624, 2025

work page arXiv 2025
[8]

arXiv preprint arXiv:2411.05990 , year=

W. Hua, O. Liu, L. Li, A. Amayuelas, J. Chen, L. Jiang, M. Jin, L. Fan, F. Sun, W. Wanget al., “Game-theoretic llm: Agent workflow for negotiation games,”arXiv preprint arXiv:2411.05990, 2024

work page arXiv 2024
[9]

Fairgame: a framework for ai agents bias recognition using game theory,

A. Buscemi, D. Proverbio, A. Di Stefano, T. A. Han, G. Castignani, and P. Li `o, “Fairgame: a framework for ai agents bias recognition using game theory,”arXiv preprint arXiv:2504.14325, 2025

work page arXiv 2025
[10]

Charging and rate control for elastic traffic,

F. Kelly, “Charging and rate control for elastic traffic,”European transactions on Telecommunications, vol. 8, no. 1, pp. 33–37, 1997

work page 1997
[11]

Do greedy autonomous systems make for a sensible internet?

B. Hajek and S. Gopalakrishnan, “Do greedy autonomous systems make for a sensible internet?” 2002, presented at the Conference on Stochastic Networks, Stanford University

work page 2002
[12]

A proportional share resource allocation algorithm for real-time, time-shared systems,

I. Stoica, H. Abdel-Wahab, K. Jeffay, S. K. Baruah, J. E. Gehrke, and C. G. Plaxton, “A proportional share resource allocation algorithm for real-time, time-shared systems,” in17th IEEE Real-Time Systems Symposium. IEEE, 1996, pp. 288–299

work page 1996
[13]

Efficiency loss in a network resource allocation game,

R. Johari and J. N. Tsitsiklis, “Efficiency loss in a network resource allocation game,”Mathematics of Operations Research, vol. 29, no. 3, pp. 407–435, 2004

work page 2004
[14]

D. S. Kirschen and G. Strbac,Fundamentals of power system eco- nomics. John Wiley & Sons, 2018

work page 2018
[15]

Competition and coalition formation of renewable power producers,

B. Zhang, R. Johari, and R. Rajagopal, “Competition and coalition formation of renewable power producers,”IEEE Transactions on Power Systems, vol. 30, no. 3, pp. 1624–1632, 2015

work page 2015
[16]

Cournot competition in networked markets,

K. Bimpikis, S. Ehsani, and R. Ilkılıc ¸, “Cournot competition in networked markets,”Management Science, vol. 65, no. 6, pp. 2467– 2481, 2019

work page 2019
[17]

Behavioral generative agents for power dispatch and auction,

S. Li, J. S. Kim, and C. Chen, “Behavioral generative agents for power dispatch and auction,”arXiv preprint arXiv:2603.08477, 2026

work page arXiv 2026
[18]

Large language model-based bidding behavior agent and market sentiment agent- assisted electricity price prediction,

X. Lu, J. Qiu, Y . Yang, C. Zhang, J. Lin, and S. An, “Large language model-based bidding behavior agent and market sentiment agent- assisted electricity price prediction,”IEEE Transactions on Energy Markets, Policy and Regulation, vol. 3, no. 2, pp. 223–235, 2024

work page 2024
[19]

Behavioral generative agents for energy operations,

C. Chen, O. Karaduman, and X. Kuang, “Behavioral generative agents for energy operations,”arXiv preprint arXiv:2506.12664, 2025

work page arXiv 2025
[20]

Multi-agent reinforcement learning in cournot games,

Y . Shi and B. Zhang, “Multi-agent reinforcement learning in cournot games,” in2020 59th ieee conference on decision and control (cdc). IEEE, 2020, pp. 3561–3566

work page 2020
[21]

Efficiency loss in cournot games,

R. Johari and J. N. Tsitsiklis, “Efficiency loss in cournot games,” Harvard University, 2005

work page 2005
[22]

Oligopoly theory,

J. Friedman, “Oligopoly theory,”Handbook of mathematical eco- nomics, vol. 2, pp. 491–534, 1982

work page 1982
[23]

Gemini 3.1 Pro model card,

Google DeepMind, “Gemini 3.1 Pro model card,” https://deepmind. google/models/model-cards/gemini-3-1-pro/, Feb. 2026, accessed: 2026-03-31

work page 2026
[24]

arXiv preprint arXiv:2407.07086 , year=

L. Cross, V . Xiang, A. Bhatia, D. L. Yamins, and N. Haber, “Hypo- thetical minds: Scaffolding theory of mind for multi-agent tasks with large language models,”arXiv preprint arXiv:2407.07086, 2024

work page arXiv 2024
[25]

Fair end-to-end window-based congestion control,

J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,”IEEE/ACM Transactions on networking, vol. 8, no. 5, pp. 556–567, 2002

work page 2002