Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure

Alice Saito; Arth Singh; Eiji Kamioka; Hershraj Niranjani; Machiko Hirota; Phan Xuan Tan; Takehiro Takayanagi; Ujwal Kumar

arxiv: 2605.26448 · v1 · pith:B3XRPX7Fnew · submitted 2026-05-26 · 💻 cs.MA · cs.GT· cs.NE

Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure

Ujwal Kumar , Arth Singh , Hershraj Niranjani , Machiko Hirota , Takehiro Takayanagi , Alice Saito , Eiji Kamioka , Phan Xuan Tan This is my paper

Pith reviewed 2026-07-01 16:53 UTC · model grok-4.3

classification 💻 cs.MA cs.GTcs.NE

keywords LLM constitutionspublic goods gameadversarial co-evolutionmulti-agent systemscooperation and defectionevolutionary searchred-teamingfitness functions

0 comments

The pith

LLM constitutions co-evolve to a stable near-parity equilibrium in public goods games when fitness couples the factions and evaluation uses enough seeds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether natural-language constitutions for LLM agents can undergo adversarial co-evolution, with one side trying to cooperate and the other to free-ride in a public goods game. It checks two uncharacterized properties: whether the fitness function actually creates pressure between the sides and whether the mutation process stays reliable when the goal is to produce specialists. With a score-advantage fitness and five evaluation seeds per generation, both sides converge to similar performance near 0.78 across different multipliers, while independent scoring produces no such pressure. The resulting Red constitutions act as readable test cases that expose weaknesses in cooperative setups. This matters for anyone building multi-agent systems that must hold up against defection.

Core claim

In the Public Goods Game, Blue cooperators and Red free-riders co-evolve their constitutions over 30 generations and reach a near-parity equilibrium at S approximately 0.78 that holds across multipliers m in {1.2, 1.5, 2.0, 3.0}. Independent per-faction scoring leaves outcomes uncoupled with correlation +0.088 and generates no adversarial pressure, while a score-advantage target S_own minus S_opp restores the pressure. Under pure-adversary fitness, K=2 evaluation seeds cause mode regression but K=5 sustains strong specialists across all generations. Adversarial co-evolution of natural-language constitutions is therefore feasible only under coupled fitness and adequate evaluation budget, and

What carries the argument

Adversarial constitutional co-evolution of LLM agents in a Public Goods Game driven by a score-advantage fitness function and LLM-based mutation.

If this is right

Both factions reach stable performance parity only when fitness explicitly compares their scores.
Score-advantage fitness is required to create measurable adversarial pressure between constitutions.
Five evaluation seeds per generation prevent regression into weak modes for the Red specialist.
Evolved Red constitutions supply readable test cases for checking future cooperative LLM designs.
The near-parity outcome holds across a range of public-goods multipliers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same co-evolution setup could be applied to grid-world or other spatial environments to check whether parity still emerges.
Red constitutions might expose recurring patterns in how LLMs fail at cooperation that could guide alignment techniques.
Longer runs or larger populations might require proportionally larger evaluation budgets to avoid regression.
Transfer tests could check whether constitutions evolved in the public goods game remain effective when placed in new games.

Load-bearing premise

The LLM mutation operator must continue to produce reliable changes even when the objective is specialized adversarial performance rather than cooperation.

What would settle it

Running the 30-generation co-evolution with K=5 seeds and finding that Red performance regresses relative to Blue across repeated trials would falsify the claim that adequate evaluation budget sustains specialists.

Figures

Figures reproduced from arXiv: 2605.26448 by Alice Saito, Arth Singh, Eiji Kamioka, Hershraj Niranjani, Machiko Hirota, Phan Xuan Tan, Takehiro Takayanagi, Ujwal Kumar.

**Figure 1.** Figure 1: PGG co-evolution over 30 generations. (a) Blue (cooperators) and Red (free-riders) stability scores; both improve and converge near S ≈ 0.78. (b) Blue advantage (SB − SR): large early lead collapses toward zero as Red’s constitution learns to survive punishment, settling at near-parity. less material to drift toward. Environment structure, not just fitness signal, shapes whether adversarial search remains … view at source ↗

**Figure 2.** Figure 2: Information asymmetry (Experiment 3). Red advantage [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: C ∗ fragility (Experiment 1). (a) Raw stability scores: Blue (fixed C ∗ ) holds steadily across 30 generations of Red adversarial evolution. (b) Red advantage SR − SB remains negative throughout; C ∗ retains its lead under sustained zero-sum adversarial search at this compute budget [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Full co-evolution arms race (Experiment 2) under score-advantage fitness. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Frontier LLM agents engage in blackmail, sabotage, and document leaks under goal conflicts in agentic settings, exposing limitations of alignment methods built around single-agent or cooperative assumptions. Recent work shows LLM-guided evolutionary search can discover effective cooperative constitutions, but two properties of the adversarial setting remain uncharacterized: whether the fitness function actually induces adversarial pressure, and whether the LLM mutation operator behaves reliably under adversarial-specialist objectives. We study adversarial constitutional co-evolution (Blue cooperators vs. Red free-riders, 30 generations) across a Public Goods Game (PGG) and a spatial grid-world. Three findings: (1) in the PGG, both factions converge to a near-parity equilibrium at S approximately 0.78, robust across tested multipliers m in {1.2, 1.5, 2.0, 3.0}; (2) in independently scored environments, per-faction scoring leaves outcomes statistically uncoupled, with corr(S_B, S_R) = +0.088, and produces no adversarial pressure; a score-advantage fitness target S_own - S_opp restores it; (3) under pure-adversary fitness, evaluation seed count K controls mode regression: K = 2 regresses, while K = 5 sustains a strong specialist for all 30 generations. Adversarial co-evolution of natural-language constitutions is feasible, but only under coupled fitness and adequate evaluation budget; the evolved Red constitutions serve as interpretable red-team artifacts for testing future cooperative designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows adversarial co-evolution of LLM constitutions works under coupled fitness and K=5 evaluations but collapses otherwise, giving usable red-team artifacts at S≈0.78.

read the letter

The core result is that prior cooperative-only evolution work left two gaps—whether fitness actually creates adversarial pressure and whether the LLM mutation step holds up under specialist goals—and this study tests both directly in a public goods game plus grid world.

It isolates the conditions cleanly: independent per-faction scoring yields near-zero correlation and no pressure, while S_own minus S_opp restores coupling; K=2 causes mode regression while K=5 keeps the Red specialist stable across 30 generations; and both sides settle near S=0.78 regardless of the multiplier. Those are concrete, previously uncharacterized outcomes.

The experiments are simple enough to replicate in principle and the Red constitutions are presented as practical red-team material, which is a useful byproduct.

The main limitation is that the findings rest on simulation runs whose variance, exact sample sizes, exclusion rules, and statistical details are not visible from the abstract, even though the logic of the three findings tracks internally. There is no mathematical reduction of the equilibria to the parameters, so the 0.78 figure is an observed regularity rather than a derived one. The reliability of the LLM as a mutation operator under pure-adversary objectives is tested but still depends on the specific model and prompt setup.

This is for people already working on multi-agent LLM alignment and constitutional methods. It is narrow but addresses a real gap with falsifiable conditions rather than hand-waving.

I would send it to peer review; the claims are scoped tightly enough that referees can check the simulation details and the practical value of the artifacts.

Referee Report

2 major / 2 minor

Summary. The manuscript reports an empirical simulation study of adversarial co-evolution between Blue (cooperator) and Red (free-rider) LLM constitutions over 30 generations in a Public Goods Game (PGG) with multipliers m in {1.2,1.5,2.0,3.0} and a spatial grid-world. It claims three findings: (1) both factions converge to a near-parity equilibrium at S≈0.78; (2) per-faction scoring yields uncoupled outcomes (corr(S_B,S_R)=+0.088) with no adversarial pressure while the coupled target S_own−S_opp restores it; (3) evaluation seed count K=2 induces mode regression while K=5 sustains specialist behavior. The central conclusion is that adversarial constitutional co-evolution is feasible only under coupled fitness and adequate evaluation budget, with evolved Red constitutions serving as red-team artifacts.

Significance. If the reported equilibria and conditional feasibility hold, the work supplies directly testable red-team artifacts and characterizes the two properties left open by prior LLM-guided evolutionary search for cooperative constitutions. The explicit empirical tests of fitness coupling and mutation reliability under specialist objectives constitute a clear advance for multi-agent alignment research addressing goal conflicts.

major comments (2)

[Results, finding (2)] Results, finding (2): the reported correlation corr(S_B, S_R)=+0.088 is presented without sample size, number of independent runs, p-value, or confidence interval, which is load-bearing for the claim that per-faction scoring produces statistically uncoupled outcomes and no adversarial pressure.
[Results, finding (3)] Results, finding (3): the claim that K=2 produces mode regression while K=5 sustains specialist behavior for all 30 generations lacks reported variance, number of runs, or exclusion criteria, which is load-bearing for the assertion that adequate evaluation budget is required to avoid regression.

minor comments (2)

[Abstract] Abstract: the symbol S is used without an explicit definition or reference to its computation in the PGG payoff structure.
[Methods] The manuscript should include a table or figure caption that reports the exact number of generations, population size, and LLM model version used in all runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and for identifying the need for additional statistical detail in the reported results. We address both major comments below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Results, finding (2)] Results, finding (2): the reported correlation corr(S_B, S_R)=+0.088 is presented without sample size, number of independent runs, p-value, or confidence interval, which is load-bearing for the claim that per-faction scoring produces statistically uncoupled outcomes and no adversarial pressure.

Authors: We agree that the statistical details are required to substantiate the claim. The correlation value was obtained from the full set of independent evolutionary runs performed in the study. In the revised manuscript we will explicitly state the sample size (number of independent runs), report the associated p-value, and include a confidence interval to confirm that the outcomes remain statistically uncoupled under per-faction scoring. revision: yes
Referee: [Results, finding (3)] Results, finding (3): the claim that K=2 produces mode regression while K=5 sustains specialist behavior for all 30 generations lacks reported variance, number of runs, or exclusion criteria, which is load-bearing for the assertion that adequate evaluation budget is required to avoid regression.

Authors: We agree that variance, run count, and any exclusion criteria should be reported. The K=2 versus K=5 comparison was conducted across multiple independent runs; we will add the observed variance, the exact number of runs, and a clear statement of exclusion criteria (if any) to the revised text so that the requirement for adequate evaluation budget is fully supported. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This paper reports an empirical simulation study of adversarial co-evolution between LLM-generated constitutions in a Public Goods Game and spatial grid-world. All headline results (equilibrium scores near 0.78, correlation under different fitness targets, effect of evaluation seed count K) are direct outputs of running the described evolutionary process for 30 generations under explicitly varied conditions. No equations, uniqueness theorems, or predictions are derived; the work contains no mathematical chain that could reduce outputs to inputs by construction, and the two uncharacterized properties are addressed via independent empirical tests rather than self-referential definitions or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the tested multiplier values and seed counts; these appear to be experimental controls rather than fitted quantities.

pith-pipeline@v0.9.1-grok · 5850 in / 1059 out tokens · 35225 ms · 2026-07-01T16:53:40.109894+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 13 canonical work pages · 6 internal anchors

[1]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

Evolving ai collectives to enhance human diversity and enable self-regulation.arXiv preprint arXiv:2402.12590, 2024

Shiyang Lai, Yujin Potter, Junsol Kim, Richard Zhuang, Dawn Song, and James Evans. Evolving ai collectives to enhance human diversity and enable self-regulation.arXiv preprint arXiv:2402.12590, 2024

work page arXiv 2024
[3]

The coming crisis of multi-agent misalignment: Ai alignment must be a dynamic and social process.arXiv preprint arXiv:2506.01080, 2025

Florian Carichon, Aditi Khandelwal, Marylou Fauchard, and Golnoosh Farnadi. The coming crisis of multi-agent misalignment: Ai alignment must be a dynamic and social process.arXiv preprint arXiv:2506.01080, 2025

work page arXiv 2025
[4]

Alexander Meinke, Bronson Schoen, Jérémy Scheurer, Mikita Balesni, Rusheb Shah, and Marius Hobbhahn

Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Sören Mindermann, Evan Hubinger, Ethan Perez, and Kevin K. Troy. Agentic misalignment: How llms could be insider threats.arXiv preprint arXiv:2510.05179, 2025

work page arXiv 2025
[5]

Klowden, T

Ujwal Kumar, Alice Saito, Hershraj Niranjani, Rayan Yessou, and Phan Xuan Tan. Evolving interpretable constitutions for multi-agent coordination.arXiv preprint arXiv:2602.00755, 2026

work page arXiv 2026
[6]

A General Language Assistant as a Laboratory for Alignment

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

Hamilton

Robert Axelrod and William D. Hamilton. The evolution of cooperation.Science, 211(4489):1390–1396, 1981

1981
[8]

Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel

Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. InProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 464–473, 2017

2017
[9]

Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, et al

Edward Hughes, Joel Z. Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, et al. Inequity aversion improves cooperation in intertemporal social dilemmas. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

2018
[10]

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, D. J. Strouse, Joel Z. Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019

2019
[11]

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior.arXiv preprint arXiv:2304.03442, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of Go with deep neural networks and tree search.Nature, 529(7587):484–489, 2016

2016
[13]

Dota 2 with Large Scale Deep Reinforcement Learning

OpenAI, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, et al. Dota 2 with large scale deep reinforcement learning.arXiv preprint arXiv:1912.06680, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[14]

Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O. Stanley. Paired Open-Ended Trailblazer (POET): End- lessly generating increasingly complex and diverse learning environments and their solutions.arXiv preprint arXiv:1901.01753, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[15]

Udaya Sankar, Vishisht Srihari Rao, Mayank Ratan Bhardwaj, and Y

V . Udaya Sankar, Vishisht Srihari Rao, Mayank Ratan Bhardwaj, and Y . Narahari. Deep learning meets mechanism design: Key results and some novel applications.arXiv preprint arXiv:2401.05683, 2024

work page arXiv 2024
[16]

An interpretable automated mechanism design framework with large language models.arXiv preprint arXiv:2502.12203, 2025

Jiayuan Liu, Mingyu Guo, and Vincent Conitzer. An interpretable automated mechanism design framework with large language models.arXiv preprint arXiv:2502.12203, 2025

work page arXiv 2025
[17]

Human-centred mechanism design with democratic AI.Nature Human Behaviour, 6(10):1398–1407, 2022

Raphael Koster, Jan Balaguer, Andrea Tacchetti, Ari Weinstein, Tina Zhu, Oliver Hauser, Duncan Williams, Lucy Campbell-Gillingham, Phoebe Thacker, Matthew Botvinick, and Christopher Summerfield. Human-centred mechanism design with democratic AI.Nature Human Behaviour, 6(10):1398–1407, 2022

2022
[18]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models.Nature, 625(7995):468–475, 2024

2024
[19]

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

AlphaEvolve Team. AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind Blog, 2025

2025
[20]

Joel Lehman, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, and Kenneth O. Stanley. Evolution through large models.arXiv preprint arXiv:2206.08896, 2022. 9 APREPRINT- MAY27, 2026

work page arXiv 2022
[21]

OpenEvolve: An open-source evolutionary coding agent

Asankhaya Sharma. OpenEvolve: An open-source evolutionary coding agent. https://github.com/ algorithmicsuperintelligence/openevolve, 2025

2025
[22]

Illuminating search spaces by mapping elites

Jean-Baptiste Mouret and Jeff Clune. Illuminating search spaces by mapping elites.arXiv preprint arXiv:1504.04909, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[23]

Cooperation and punishment in public goods experiments.American Economic Review, 90(4):980–994, 2000

Ernst Fehr and Simon Gächter. Cooperation and punishment in public goods experiments.American Economic Review, 90(4):980–994, 2000

2000
[24]

V olunteering as Red Queen mechanism for cooperation in public goods games.Science, 296(5570):1129–1132, 2002

Christoph Hauert, Silvia De Monte, Josef Hofbauer, and Karl Sigmund. V olunteering as Red Queen mechanism for cooperation in public goods games.Science, 296(5570):1129–1132, 2002

2002
[25]

Rand, Anna Dreber, Tore Ellingsen, Drew Fudenberg, and Martin A

David G. Rand, Anna Dreber, Tore Ellingsen, Drew Fudenberg, and Martin A. Nowak. Positive interactions promote public cooperation.Science, 325(5945):1272–1275, 2009

2009
[26]

win” from killing Blue agents is indistinguishable from a Red “win

OpenAI. GPT-OSS-120B.https://openrouter.ai/openai/gpt-oss-120b, 2025. 10 APREPRINT- MAY27, 2026 Appendix A Formal Algorithm and Optimization Definition The framework summary in Section 5.1 omitted formal details for brevity. Here we give them explicitly. Definition A.1(Co-Evolution).Given environment E, initial constitutions C 0 B, C0 R, and scoring funct...

2025

[1] [1]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

Evolving ai collectives to enhance human diversity and enable self-regulation.arXiv preprint arXiv:2402.12590, 2024

Shiyang Lai, Yujin Potter, Junsol Kim, Richard Zhuang, Dawn Song, and James Evans. Evolving ai collectives to enhance human diversity and enable self-regulation.arXiv preprint arXiv:2402.12590, 2024

work page arXiv 2024

[3] [3]

The coming crisis of multi-agent misalignment: Ai alignment must be a dynamic and social process.arXiv preprint arXiv:2506.01080, 2025

Florian Carichon, Aditi Khandelwal, Marylou Fauchard, and Golnoosh Farnadi. The coming crisis of multi-agent misalignment: Ai alignment must be a dynamic and social process.arXiv preprint arXiv:2506.01080, 2025

work page arXiv 2025

[4] [4]

Alexander Meinke, Bronson Schoen, Jérémy Scheurer, Mikita Balesni, Rusheb Shah, and Marius Hobbhahn

Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Sören Mindermann, Evan Hubinger, Ethan Perez, and Kevin K. Troy. Agentic misalignment: How llms could be insider threats.arXiv preprint arXiv:2510.05179, 2025

work page arXiv 2025

[5] [5]

Klowden, T

Ujwal Kumar, Alice Saito, Hershraj Niranjani, Rayan Yessou, and Phan Xuan Tan. Evolving interpretable constitutions for multi-agent coordination.arXiv preprint arXiv:2602.00755, 2026

work page arXiv 2026

[6] [6]

A General Language Assistant as a Laboratory for Alignment

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

Hamilton

Robert Axelrod and William D. Hamilton. The evolution of cooperation.Science, 211(4489):1390–1396, 1981

1981

[8] [8]

Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel

Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. InProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 464–473, 2017

2017

[9] [9]

Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, et al

Edward Hughes, Joel Z. Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, et al. Inequity aversion improves cooperation in intertemporal social dilemmas. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

2018

[10] [10]

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, D. J. Strouse, Joel Z. Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019

2019

[11] [11]

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior.arXiv preprint arXiv:2304.03442, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of Go with deep neural networks and tree search.Nature, 529(7587):484–489, 2016

2016

[13] [13]

Dota 2 with Large Scale Deep Reinforcement Learning

OpenAI, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, et al. Dota 2 with large scale deep reinforcement learning.arXiv preprint arXiv:1912.06680, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912

[14] [14]

Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O. Stanley. Paired Open-Ended Trailblazer (POET): End- lessly generating increasingly complex and diverse learning environments and their solutions.arXiv preprint arXiv:1901.01753, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[15] [15]

Udaya Sankar, Vishisht Srihari Rao, Mayank Ratan Bhardwaj, and Y

V . Udaya Sankar, Vishisht Srihari Rao, Mayank Ratan Bhardwaj, and Y . Narahari. Deep learning meets mechanism design: Key results and some novel applications.arXiv preprint arXiv:2401.05683, 2024

work page arXiv 2024

[16] [16]

An interpretable automated mechanism design framework with large language models.arXiv preprint arXiv:2502.12203, 2025

Jiayuan Liu, Mingyu Guo, and Vincent Conitzer. An interpretable automated mechanism design framework with large language models.arXiv preprint arXiv:2502.12203, 2025

work page arXiv 2025

[17] [17]

Human-centred mechanism design with democratic AI.Nature Human Behaviour, 6(10):1398–1407, 2022

Raphael Koster, Jan Balaguer, Andrea Tacchetti, Ari Weinstein, Tina Zhu, Oliver Hauser, Duncan Williams, Lucy Campbell-Gillingham, Phoebe Thacker, Matthew Botvinick, and Christopher Summerfield. Human-centred mechanism design with democratic AI.Nature Human Behaviour, 6(10):1398–1407, 2022

2022

[18] [18]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models.Nature, 625(7995):468–475, 2024

2024

[19] [19]

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

AlphaEvolve Team. AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind Blog, 2025

2025

[20] [20]

Joel Lehman, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, and Kenneth O. Stanley. Evolution through large models.arXiv preprint arXiv:2206.08896, 2022. 9 APREPRINT- MAY27, 2026

work page arXiv 2022

[21] [21]

OpenEvolve: An open-source evolutionary coding agent

Asankhaya Sharma. OpenEvolve: An open-source evolutionary coding agent. https://github.com/ algorithmicsuperintelligence/openevolve, 2025

2025

[22] [22]

Illuminating search spaces by mapping elites

Jean-Baptiste Mouret and Jeff Clune. Illuminating search spaces by mapping elites.arXiv preprint arXiv:1504.04909, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[23] [23]

Cooperation and punishment in public goods experiments.American Economic Review, 90(4):980–994, 2000

Ernst Fehr and Simon Gächter. Cooperation and punishment in public goods experiments.American Economic Review, 90(4):980–994, 2000

2000

[24] [24]

V olunteering as Red Queen mechanism for cooperation in public goods games.Science, 296(5570):1129–1132, 2002

Christoph Hauert, Silvia De Monte, Josef Hofbauer, and Karl Sigmund. V olunteering as Red Queen mechanism for cooperation in public goods games.Science, 296(5570):1129–1132, 2002

2002

[25] [25]

Rand, Anna Dreber, Tore Ellingsen, Drew Fudenberg, and Martin A

David G. Rand, Anna Dreber, Tore Ellingsen, Drew Fudenberg, and Martin A. Nowak. Positive interactions promote public cooperation.Science, 325(5945):1272–1275, 2009

2009

[26] [26]

win” from killing Blue agents is indistinguishable from a Red “win

OpenAI. GPT-OSS-120B.https://openrouter.ai/openai/gpt-oss-120b, 2025. 10 APREPRINT- MAY27, 2026 Appendix A Formal Algorithm and Optimization Definition The framework summary in Section 5.1 omitted formal details for brevity. Here we give them explicitly. Definition A.1(Co-Evolution).Given environment E, initial constitutions C 0 B, C0 R, and scoring funct...

2025