arxiv: 2605.06525 · v1 · submitted 2026-05-07 · 💻 cs.GT · cs.MA· econ.TH

Recognition: unknown

Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs

Jonathan Shaki , Eden Hartman , Sarit Kraus , Yonatan Aumann

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:02 UTC · model grok-4.3

classification 💻 cs.GT cs.MAecon.TH

keywords folk theoremlarge language modelsrepeated gamescooperationNash equilibriummulti-agent systemsAI advice

0 comments

The pith

Shared LLM guidance sustains all feasible and individually rational outcomes as ε-equilibria in repeated games despite indirect observation and hidden advisor identities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how large language models advising multiple agents create coupling that affects cooperation when underlying incentives are misaligned. In one-shot interactions, shared instructions change equilibrium behavior only if an LLM influences more than one role in the same game, and the impact of each LLM's client share can be beneficial, harmful, or non-monotone. The main contribution proves a folk theorem for the repeated setting: all feasible and individually rational outcomes can be sustained as ε-equilibria. This holds even though clients cannot identify which LLM advised their opponents and observations of actions are indirect, requiring new proof techniques beyond the standard repeated-game folk theorem.

Core claim

In the repeated setting where multiple LLMs each advise populations of clients playing instances of an underlying game, all feasible and individually rational outcomes of that game can be sustained as ε-equilibria in the induced meta-game among the LLMs. This result holds despite indirect observation of actions and clients' inability to identify the specific LLM advising their opponents. The construction does not follow from the classical folk theorem and relies on new equilibrium strategies that operate at the level of LLM advice.

What carries the argument

The meta-game among the LLMs, created when each model advises a population of clients who interact in the underlying repeated game.

If this is right

In one-shot games, cooperation emerges only when an LLM can advise multiple roles within the same interaction.
Varying the share of clients per LLM can increase, decrease, or non-monotonically affect equilibrium cooperation depending on the base game.
Repeated play allows any rational outcome to be sustained approximately even without direct identification of advisors.
Shared LLM guidance couples agents who appear independent, expanding the set of sustainable cooperative outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Different LLM providers may effectively compete or coordinate through the client populations they serve, creating a new layer of strategic interaction.
Design choices in how LLMs are deployed across agent populations could be used to steer long-run outcomes toward cooperation.
The result suggests testing whether similar ε-equilibrium constructions survive when LLMs have memory limits or when client populations overlap in more complex ways.

Load-bearing premise

The repeated structure must allow construction of equilibria that overcome indirect observation and the clients' inability to identify which LLM advised their opponents.

What would settle it

A concrete feasible and individually rational payoff profile that cannot be approximated as an equilibrium when clients cannot distinguish which LLM advised their opponents.

read the original abstract

Large language models (LLMs) are increasingly used to provide instructions to many agents who interact with one another. Such shared reliance couples agents who appear to act independently: they may in fact be guided by a common model. This coupling can change the prospects for cooperation among agents with misaligned incentives. We study settings in which multiple LLMs each advise a population of clients who participate in instances of an underlying game, creating strategic interaction at the level of the LLMs themselves. This induces a meta-game among the LLMs, mediated through clients. We first analyze the one-shot setting, where shared instructions can change equilibrium behavior only when an LLM may influence more than one role in the same interaction; in such cases, cooperation may emerge, and the effect of client share can be beneficial, harmful, or non-monotone, depending on the base game. Our main result concerns the repeated setting. We prove a folk theorem for LLMs: despite indirect observation and the clients' inability to identify which LLM advised their opponents, all feasible and individually rational outcomes can be sustained as $\varepsilon$-equilibria. The result does not follow from the standard folk theorem and requires new proof techniques. Together, these results show that shared LLM guidance can sustain cooperation among populations of agents even when the underlying incentives are misaligned.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper examines how shared LLMs advising populations of clients in underlying games induce a meta-game among the LLMs. In the one-shot setting, shared instructions alter equilibrium behavior only when an LLM influences multiple roles in the same interaction, with client-share effects that can be beneficial, harmful, or non-monotone. The central result is a folk theorem for the repeated setting: despite indirect observation and clients' inability to identify which LLM advised their opponents, all feasible and individually rational outcomes can be sustained as ε-equilibria. The proof requires new techniques beyond the standard repeated-game folk theorem.

Significance. If the folk theorem holds, the result provides a theoretical basis for how LLM-mediated guidance can sustain cooperation in populations facing misaligned incentives, even under realistic constraints like indirect observation. The explicit development of new proof techniques to handle the meta-game induced by shared advisors is a notable strength, as it directly addresses a setting where classical folk theorems do not apply.

major comments (1)

[Main result (folk theorem proof)] The central claim rests on a folk theorem whose proof uses new techniques to construct equilibria under indirect observation and non-identifiability of advisors. The manuscript states that the result does not follow from the standard folk theorem, yet the provided abstract and high-level description do not include the full derivation, the explicit equilibrium strategies, or verification that the construction succeeds in the induced meta-game. This gap is load-bearing because the viability of the ε-equilibrium construction is the sole support for the claim that all feasible IR outcomes are attainable.

minor comments (2)

[One-shot setting] The one-shot analysis mentions non-monotone effects of client share but does not illustrate them with a concrete base game or payoff matrix; adding a small example would clarify the claim.
[Introduction] Notation for the meta-game (e.g., how client populations map to LLM influence) could be introduced earlier to improve readability before the repeated-game section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for identifying the need to strengthen the presentation of our main result. We address the concern about the folk theorem proof below and commit to revisions that improve clarity while preserving the manuscript's contributions.

read point-by-point responses

Referee: [Main result (folk theorem proof)] The central claim rests on a folk theorem whose proof uses new techniques to construct equilibria under indirect observation and non-identifiability of advisors. The manuscript states that the result does not follow from the standard folk theorem, yet the provided abstract and high-level description do not include the full derivation, the explicit equilibrium strategies, or verification that the construction succeeds in the induced meta-game. This gap is load-bearing because the viability of the ε-equilibrium construction is the sole support for the claim that all feasible IR outcomes are attainable.

Authors: We appreciate this feedback on the presentation of the central result. The full proof appears in Section 4, which develops new techniques for the induced meta-game: we explicitly construct LLM strategies that sustain any feasible and individually rational payoff vector as an ε-equilibrium despite clients' inability to identify which LLM advised opponents. The construction uses a block structure with coordinated reward and punishment phases, where deviations are detected via aggregate client play (leveraging the shared-advisor coupling) rather than direct identification; we then verify incentive compatibility by bounding the one-shot deviation gain by ε, accounting for the indirect observation. This does not reduce to the standard folk theorem because the meta-game payoffs are not directly observed by the LLMs. That said, we agree the introduction and abstract provide only a high-level sketch. In revision we will expand the main-text outline to include the explicit strategy form, the key detection mechanism, and a step-by-step verification that the construction works in the meta-game, moving only the most technical lemmas to the appendix. This addresses the load-bearing concern without changing the result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is a novel proof extension

full rationale

The paper claims to prove a folk theorem for LLM meta-games in repeated settings that does not follow from the classical folk theorem and requires new proof techniques to handle indirect observation and non-attribution of advisors. No equations, parameters, or self-citations are presented that reduce the central result to a fit, definition, or prior author work by construction. The one-shot analysis and repeated-game construction are described as independent extensions of standard repeated-game theory, with the result explicitly positioned as non-derivable from prior theorems. This is a self-contained mathematical argument with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard repeated-game assumptions plus new proof techniques for the LLM meta-game; no free parameters or invented entities are introduced in the abstract.

axioms (1)

standard math Standard assumptions of repeated games (discounting or infinite horizon) allow equilibrium constructions
Folk theorems in repeated games typically invoke these; the abstract invokes the repeated setting for the main result.

pith-pipeline@v0.9.0 · 5543 in / 1169 out tokens · 82795 ms · 2026-05-08T04:02:16.769357+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Cambridge University Press, 2021

Arif Ahmed.Evidential decision theory. Cambridge University Press, 2021

2021
[2]

Private bayesian persuasion.Journal of Economic Theory, 182: 185–217, 2019

Itai Arieli and Yakov Babichenko. Private bayesian persuasion.Journal of Economic Theory, 182: 185–217, 2019

2019
[3]

Mediators in position auctions

Itai Ashlagi, Dov Monderer, and Moshe Tennenholtz. Mediators in position auctions. InProceedings of the 8th ACM conference on Electronic commerce, pages 279–287, 2007

2007
[4]

Robert J. Aumann. Markets with a continuum of traders.Econometrica, 32(1/2):39–50, 1964. doi: 10.2307/1913732

work page doi:10.2307/1913732 1964
[5]

Equilibrium computation in atomic splittable rout- ing games

Umang Bhaskar and Phani Raj Lolakapuri. Equilibrium computation in atomic splittable rout- ing games. In26th Annual European Symposium on Algorithms (ESA 2018), pages 1–14. Schloss Dagstuhl–Leibniz-Zentrum f¨ur Informatik, 2018

2018
[6]

On the uniqueness of equilib- rium in atomic splittable routing games.Mathematics of Operations Research, 40(3):634–654, 2015

Umang Bhaskar, Lisa Fleischer, Darrell Hoy, and Chien-Chung Huang. On the uniqueness of equilib- rium in atomic splittable routing games.Mathematics of Operations Research, 40(3):634–654, 2015

2015
[7]

Zana Buc ¸inca, Maja Barbara Malaya, and Krzysztof Z. Gajos. To trust or to think: Cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making.Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–21, 2021. doi: 10.1145/3449287

work page doi:10.1145/3449287 2021
[8]

More than decision support: Exploring patients’ longitudinal usage of large language models in real-world healthcare-seeking journeys

Yancheng Cao, Yishu Ji, Yue Fu, Sahiti Dharmavaram, Meghan Turchioe, Natalie C Benda, Lena Mamykina, Yuling Sun, and Xuhai Xu. More than decision support: Exploring patients’ longitudinal usage of large language models in real-world healthcare-seeking journeys. 2026

2026
[9]

How people use chatgpt

Aaron Chatterji, Thomas Cunningham, David J Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman. How people use chatgpt. Technical report, National Bureau of Economic Re- search, 2025

2025
[10]

Designing preferences, beliefs, and identities for artificial intelligence

Vincent Conitzer. Designing preferences, beliefs, and identities for artificial intelligence. InProceed- ings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9755–9759, 2019. 11

2019
[11]

Foundations of cooperative ai

Vincent Conitzer and Caspar Oesterheld. Foundations of cooperative ai. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15359–15367, 2023

2023
[12]

Computing the optimal strategy to commit to

Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. InProceedings of the 7th ACM conference on Electronic commerce, pages 82–90, 2006

2006
[13]

Open problems in cooperative ai

Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R McKee, Joel Z Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative ai.arXiv preprint arXiv:2012.08630, 2020

work page arXiv 2012
[14]

Co- operative ai: machines must learn to find common ground.Nature, 593(7857):33–36, 2021

Allan Dafoe, Yoram Bachrach, Gillian Hadfield, Eric Horvitz, Kate Larson, and Thore Graepel. Co- operative ai: machines must learn to find common ground.Nature, 593(7857):33–36, 2021

2021
[15]

Reducing congestion through information design

Sanmay Das, Emir Kamenica, and Renee Mirka. Reducing congestion through information design. In2017 55th annual allerton conference on communication, control, and computing (allerton), pages 1279–1284. IEEE, 2017

2017
[16]

Emergent cooper- ation and strategy adaptation in multi-agent systems: An extended coevolutionary theory with llms

I De Zarz `a, J De Curt `o, Gemma Roig, Pietro Manzoni, and Carlos T Calafate. Emergent cooper- ation and strategy adaptation in multi-agent systems: An extended coevolutionary theory with llms. Electronics, 12(12):2722, 2023

2023
[17]

The folk theorem in repeated games with anony- mous random matching.Econometrica, 88(3):917–964, 2020

Joyee Deb, Takuo Sugaya, and Alexander Wolitzky. The folk theorem in repeated games with anony- mous random matching.Econometrica, 88(3):917–964, 2020

2020
[18]

Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Sys- tems, 37:28219–28253, 2024

Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel- Eskin, Mohit Bansal, Tianlong Chen, and Kaidi Xu. Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Sys- tems, 37:28219–28253, 2024

2024
[19]

Cooperation in the prisoner’s dilemma with anonymous random matching.The Review of Economic Studies, 61(3):567–588, 1994

Glenn Ellison. Cooperation in the prisoner’s dilemma with anonymous random matching.The Review of Economic Studies, 61(3):567–588, 1994

1994
[20]

The folk theorem in repeated games with discounting or with incomplete information.Econometrica, 54(3):533–554, 1986

Drew Fudenberg and Eric Maskin. The folk theorem in repeated games with discounting or with incomplete information.Econometrica, 54(3):533–554, 1986

1986
[21]

Levine, and Eric Maskin

Drew Fudenberg, David K. Levine, and Eric Maskin. The folk theorem with imperfect public infor- mation.Econometrica, 62(5):997–1039, 1994

1994
[22]

A survey of collusion risk in llm-powered multi-agent systems

Mohammad Sajjad Ghaemi. A survey of collusion risk in llm-powered multi-agent systems. InSocially Responsible and Trustworthy Foundation Models at NeurIPS 2025, 2025

2025
[23]

Large language model based multi-agents: A survey of progress and challenges,

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and chal- lenges. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pages 8048–8057. International Joint Conferences on Artificial Int...

work page doi:10.24963/ijcai.2024/890 2024
[24]

and others , title =

Shanshan Han, Qifan Zhang, Weizhao Jin, and Zhaozhuo Xu. Llm multi-agent systems: Challenges and open problems.arXiv preprint arXiv:2402.03578, 2024

work page arXiv 2024
[25]

Larsen, and Richard Hackathorn

Abram Handler, Kai R. Larsen, and Richard Hackathorn. Large language models present new questions for decision support.International Journal of Information Management, 79:102811, 2024. doi: 10. 1016/j.ijinfomgt.2024.102811. 12

work page arXiv 2024
[26]

Uniqueness of equilibria in atomic splittable polymatroid congestion games.Journal of Combinatorial Optimization, 36(3):812–830, 2018

Tobias Harks and Veerle Timmermans. Uniqueness of equilibria in atomic splittable polymatroid congestion games.Journal of Combinatorial Optimization, 36(3):812–830, 2018

2018
[27]

Collusion in atomic splittable routing games.Theory of Computing Systems, 52 (4):763–801, 2013

Chien-Chung Huang. Collusion in atomic splittable routing games.Theory of Computing Systems, 52 (4):763–801, 2013

2013
[28]

A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35(2), January 2026

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35(2), January 2026

2026
[29]

Bayesian persuasion.American Economic Review, 101(6): 2590–2615, 2011

Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.American Economic Review, 101(6): 2590–2615, 2011

2011
[30]

Social norms and community enforcement.The Review of Economic Studies, 59 (1):63–80, 1992

Michihiro Kandori. Social norms and community enforcement.The Review of Economic Studies, 59 (1):63–80, 1992

1992
[31]

Ali Khan and Yeneng Sun

M. Ali Khan and Yeneng Sun. Non-cooperative games with many players. In Robert J. Aumann and Sergiu Hart, editors,Handbook of Game Theory with Economic Applications, volume 3, chapter 46, pages 1761–1808. Elsevier, 2002

2002
[32]

Sunnie S. Y . Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, and Olga Russakovsky. Fostering appropriate reliance on large language models: The role of explanations, sources, and in- consistencies. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, New York, NY , USA, 2025. Association for Computing Machinery. do...

work page doi:10.1145/3706598.3714020 2025
[33]

Trust and reliance on ai—an experimental study on the extent and costs of overreliance on ai.Computers in Human Behavior, 160:108352, 2024

Artur Klingbeil, Cassandra Gr ¨utzner, and Philipp Schreck. Trust and reliance on ai—an experimental study on the extent and costs of overreliance on ai.Computers in Human Behavior, 160:108352, 2024. doi: 10.1016/j.chb.2024.108352

work page doi:10.1016/j.chb.2024.108352 2024
[34]

Causal decision theory.Australasian journal of philosophy, 59(1):5–30, 1981

David Lewis. Causal decision theory.Australasian journal of philosophy, 59(1):5–30, 1981

1981
[35]

Oxford university press, 2006

George J Mailath and Larry Samuelson.Repeated games and reputations: long-run relationships. Oxford university press, 2006

2006
[36]

Strong mediated equilibrium.Artificial Intelligence, 173(1): 180–195, 2009

Dov Monderer and Moshe Tennenholtz. Strong mediated equilibrium.Artificial Intelligence, 173(1): 180–195, 2009

2009
[37]

Safe pareto improvements for delegated game playing.Au- tonomous Agents and Multi-Agent Systems, 36(2):46, 2022

Caspar Oesterheld and Vincent Conitzer. Safe pareto improvements for delegated game playing.Au- tonomous Agents and Multi-Agent Systems, 36(2):46, 2022

2022
[38]

Competitive routing in multiuser communication networks.IEEE/ACM Transactions on networking, 1(5):510–521, 2002

Ariel Orda, Raphael Rom, and Nahum Shimkin. Competitive routing in multiuser communication networks.IEEE/ACM Transactions on networking, 1(5):510–521, 2002

2002
[39]

Osborne and Ariel Rubinstein.A Course in Game Theory

Martin J. Osborne and Ariel Rubinstein.A Course in Game Theory. MIT Press, Cambridge, MA, 1994

1994
[40]

Overreliance on ai: Literature review

Samir Passi and Mihaela V orvoreanu. Overreliance on ai: Literature review. Microsoft Technical Re- port MSR-TR-2022-12, Microsoft, June 2022. URLhttps://www.microsoft.com/en-us/ research/publication/overreliance-on-ai-literature-review/

2022
[41]

Cooperate or collapse: Emergence of sustainable cooperation in a society of llm agents

Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Sch ¨olkopf, Mrinmaya Sachan, and Rada Mihalcea. Cooperate or collapse: Emergence of sustainable cooperation in a society of llm agents. Advances in Neural Information Processing Systems, 37:111715–111759, 2024. 13

2024
[42]

On existence of rich Fubini extensions.Economic Theory, 45(1/2):1–22, 2010

Konrad Podczeck. On existence of rich Fubini extensions.Economic Theory, 45(1/2):1–22, 2010

2010
[43]

Local smoothness and the price of anarchy in splittable congestion games.Journal of Economic Theory, 156:317–342, 2015

Tim Roughgarden and Florian Schoppmann. Local smoothness and the price of anarchy in splittable congestion games.Journal of Economic Theory, 156:317–342, 2015

2015
[44]

Equilibrium points of nonatomic games.Journal of Statistical Physics, 7(4):295– 300, 1973

David Schmeidler. Equilibrium points of nonatomic games.Journal of Statistical Physics, 7(4):295– 300, 1973. doi: 10.1007/BF01014905

work page doi:10.1007/bf01014905 1973
[45]

Bayesian persuasion with externalities: exploiting agent types

Jonathan Shaki, Jiarui Gan, and Sarit Kraus. Bayesian persuasion with externalities: exploiting agent types. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14095– 14102, 2025

2025
[46]

Persuading stable matching.arXiv preprint arXiv:2511.04846, 2025

Jonathan Shaki, Jiarui Gan, and Sarit Kraus. Persuading stable matching.arXiv preprint arXiv:2511.04846, 2025

work page arXiv 2025
[47]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence , articleno =

Haoran Sun, Yusen Wu, Yukun Cheng, and Xu Chu. Game theory meets large language models: a systematic survey. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI ’25, 2025. ISBN 978-1-956792-06-5. doi: 10.24963/ijcai.2025/1184

work page doi:10.24963/ijcai.2025/1184 2025
[48]

The exact law of large numbers via Fubini extension and characterization of insurable risks.Journal of Economic Theory, 126(1):31–69, 2006

Yeneng Sun. The exact law of large numbers via Fubini extension and characterization of insurable risks.Journal of Economic Theory, 126(1):31–69, 2006

2006
[49]

Program equilibrium.Games and Economic Behavior, 49(2):363–373, 2004

Moshe Tennenholtz. Program equilibrium.Games and Economic Behavior, 49(2):363–373, 2004

2004
[50]

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, and Zhijing Jin. Coope- val: Benchmarking cooperation-sustaining mechanisms and llm agents in social dilemmas.arXiv preprint arXiv:2604.15267, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[51]

Pro-ai bias in large language models.arXiv preprint arXiv:2601.13749, 2026

Benaya Trabelsi, Jonathan Shaki, and Sarit Kraus. Pro-ai bias in large language models.arXiv preprint arXiv:2601.13749, 2026

work page arXiv 2026
[52]

Implementing the wisdom of waze

Shoshana Vasserman, Michal Feldman, and Avinatan Hassidim. Implementing the wisdom of waze. InIJCAI, volume 15, pages 660–666, 2015

2015
[53]

Springer Science & Business Media, 2010

Heinrich V on Stackelberg.Market structure and equilibrium. Springer Science & Business Media, 2010

2010
[54]

A survey on large language model based autonomous agents , volume =

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18:186345, 2024. doi: 10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[55]

Multicopy reinforcement learning agents.arXiv preprint arXiv:2309.10908, 2023

Alicia P Wolfe, Oliver Diamond, Brigitte Goeler-Slough, Remi Feuerman, Magdalena Kisielinska, and Victoria Manfredi. Multicopy reinforcement learning agents.arXiv preprint arXiv:2309.10908, 2023

work page arXiv 2023
[56]

Value of information in bayesian routing games

Manxi Wu, Saurabh Amin, and Asuman E Ozdaglar. Value of information in bayesian routing games. Operations Research, 69(1):148–163, 2021

2021
[57]

Torr, Bernard Ghanem, and Guohao Li

Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Shiyang Lai, Kai Shu, Jindong Gu, Adel Bibi, Ziniu Hu, David Jurgens, James Evans, Philip H.S. Torr, Bernard Ghanem, and Guohao Li. Can large language model agents simulate human trust behavior? InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, N...

2024
[58]

Language agents with reinforcement learning for strategic play in the werewolf game

Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. Language agents with reinforcement learning for strategic play in the werewolf game. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 15 A Appendix B Formal Model We now present the formal model. As is standard in economics and game theory, we model both clien...

2024
[59]

Hence the medium LLM cannot improve

Any deviation that does not put all four roles on one action can be rewarded only through accidental completion by the large singleton or the small role, and is strictly worse. Hence the medium LLM cannot improve. Finally, the small LLM cannot improve because, against the proposed large and medium strategies, its single role can never be part of an exact ...
[60]

23 Construction.Fix a numberξą0such that 12ξăϵ,5ξăγ Choose one feasible strictly individually rational vectors

For everyjPL,|U jpSq ´r j| ďγ. 23 Construction.Fix a numberξą0such that 12ξăϵ,5ξăγ Choose one feasible strictly individually rational vectors. Since the feasible set is convex, for some suffi- ciently smallλą0the vector ˆr:“ p1´λqr`λs is feasible, strictly individually rational, and satisfies }ˆr´r}8 ďξ. By feasibility, choose finitely many meta-action pr...
[61]

, n, play the profile M h for exactlyT h time periods

For eachh“1, . . . , n, play the profile M h for exactlyT h time periods
[62]

Every LLMqwithq‰lplays M h,q deterministically
[63]

(b) probabilityp: probe (=test move): uniformly select an extreme meta-action fromtM a:aPAu

At each time period of blockh, the inspectable LLMl: (a) probability1´p: play M h,l . (b) probabilityp: probe (=test move): uniformly select an extreme meta-action fromtM a:aPAu
[64]

Call a time perioddiscrepantifH l h,s ‰ ¯H h, and letd l :“ 1 T řn h“1 řTh s“1 1tH l h,s‰ ¯H hu be the discrepancy fraction in the phase-lblock

LetH l h,s denote the realized public aggregate at thes-th time period of blockh. Call a time perioddiscrepantifH l h,s ‰ ¯H h, and letd l :“ 1 T řn h“1 řTh s“1 1tH l h,s‰ ¯H hu be the discrepancy fraction in the phase-lblock. At the end of the block: - If there existh, s, i, asuch thatH l h,spi, aq ąΓ h,lpi, aq,it is anexcess deviationand the protocol mo...