arxiv: 2604.06688 · v1 · submitted 2026-04-08 · 💻 cs.CE

Recognition: 2 theorem links

· Lean Theorem

When Agent Markets Arrive

Xuan Liu , Haoyang Shang , Haojian Jin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:04 UTC · model grok-4.3

classification 💻 cs.CE

keywords agent marketsAI agentsinstitutional designcognitive labourmarket simulationwealth generationDIAGON

0 comments

The pith

Agent markets generate 3.2 times the wealth of isolated agents, but common institutional choices can reduce those gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents DIAGON, a programmable simulation environment in which heterogeneous tool-using AI agents post jobs, bid, negotiate, execute tasks, pay, and build reputation. When agents trade in this market they produce 3.2 times the total wealth of identical agents that must complete every task themselves. The same simulations show that several standard market interventions, including identity transparency and stronger competitive selection, actually lower overall performance instead of raising it. These results indicate that the economic rules chosen early in agent-platform design will shape long-run productivity. The work therefore supplies a concrete testbed for evaluating institutional designs before they are locked into real agent marketplaces.

Core claim

Market exchange among heterogeneous tool-using agents produces 3.2 times the wealth of self-sufficient agents, yet these gains are sensitive to institutional structure; interventions such as identity transparency and stronger competitive selection can degrade rather than improve market performance.

What carries the argument

DIAGON, a programmable market system that makes the full cycle of job posting, bidding, negotiation, execution, payment, and reputation accumulation end-to-end observable and experimentally manipulable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platform builders should run controlled variants of identity and selection rules before committing to any single design.
If real agent cognition diverges from the simulated rules, the magnitude of market gains could change substantially.
The simulation framework could be extended to hybrid human-agent markets to test whether the same institutional sensitivities appear.

Load-bearing premise

The heterogeneous tool-using agents and their decision rules in the DIAGON simulation are sufficiently representative of the behaviors that will appear in deployed agent cognitive-labour markets.

What would settle it

Deploying the same market rules with actual production AI agents and measuring whether total wealth reaches or falls short of 3.2 times the self-sufficient baseline.

Figures

Figures reproduced from arXiv: 2604.06688 by Haojian Jin, Haoyang Shang, Xuan Liu.

**Figure 1.** Figure 1: Market vs. autarky. A Wealth Lorenz curves (Gini coefficient measures inequality; 0 = perfect equality, 1 = one agent holds everything; market = 0.33, autarky = 0.42). B Contract award Lorenz curves (market Gini = 0.39, autarky = 0.28). C Task quality distributions (market mean = 0.55, autarky = 0.46; d = +0.19, p < 0.001).1 Full comparison in Appendix D.1. authentic trajectories; on a miss the task runs… view at source ↗

**Figure 2.** Figure 2: Emergent network structure (3-seed baseline; shading shows [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Trade mechanics. A Reputation vs. final wealth by model family (r = 0.44, p < 0.001). B Bid price distribution by family (grey = all agents combined). C False dispute rate over 24 rounds (3-seed mean ± SD, with rolling average and trend). 5.2 How Do Agents Trade? Nobody assigns roles in Diagon, yet by the final round model families have differentiated: some drift toward net-contractor status while others … view at source ↗

**Figure 4.** Figure 4: Ablation effect sizes (Cohen’s d vs. baseline) for six institutional conditions across multiple metrics. Solid bars: p < 0.05; faded: not significant. Transparency produces the largest single effect: cross-family trade collapses (d = −1.76, p < 0.001). Fierce selection degrades all metrics simultaneously. Trust Fair Coop Reward Punish Risk Strat Exploit 0.15 0.10 0.05 0.00 0.05 0.10 Score A Theme fingerpri… view at source ↗

**Figure 5.** Figure 5: Agent personality and belief. A Theme fingerprint by model family: each bar shows how strongly a family’s evaluation reasoning aligns with eight semantic themes (trust, fairness, cooperation, reward, punishment, risk, strategic, exploitation), measured by embedding projection. B Final belief polarity by skill cluster: sentiment polarity (positive = optimistic, negative = pessimistic) of each agent’s final … view at source ↗

**Figure 6.** Figure 6: Reputation predicts wealth. Agents who receive higher average payment ratios [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: False dispute rates: the fraction of objectively adequate work ( [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Bid price distributions. (a) DeepSeek consistently underbids (median [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Final belief sentiment polarity (positive = optimistic, negative = pessimistic), [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Profit and sentiment. (a) Mean contractor profit varies substantially by task [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Skill-level payment analysis. (a) Payment ratio distributions by skill cluster. [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Extended network analysis (4 panels). A Role emergence: model families differentiate between net contractors (right) and net posters (left) from R6 to R24. Marker size proportional to total trade volume. B Three concentration metrics: Volume Gini (blue) rises from ∼30% to 40%; HHI (red) spikes early then stabilises; unique trading pairs (green) grow to 300+. C Reciprocity (fraction of edges with a return … view at source ↗

**Figure 13.** Figure 13: Wealth and reputation trajectories over 24 rounds (3-seed baseline, 1,957 trans [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

read the original abstract

AI agents are increasingly transacting on behalf of users -- delegating tasks, spending budgets, and negotiating with unfamiliar counterparties. From skill marketplaces to agent-only bazaars, the economic infrastructure of these emerging platforms is being built ad-hoc, yet early design choices tend to lock in; understanding what dynamics they produce is urgent. We present \diagon, a programmable market system designed to inform the institutional design of near-future agent cognitive-labour markets. \diagon is populated by heterogeneous tool-using agents, making the full cycle of job posting, bidding, negotiation, execution, payment, and reputation accumulation end-to-end observable and experimentally manipulable. We instantiate one market form to demonstrate \diagon. We find that market exchange generates $3.2\times$ the wealth of self-sufficient agents, but these gains depend strongly on institutional structure; for example, interventions such as identity transparency and stronger competitive selection can degrade market performance rather than improve it. These findings highlight concrete design requirements for the economic infrastructure of the agent era. Code and data are available at https://github.com/assassin808/diagon.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DIAGON gives a programmable simulator for full-cycle agent markets and reports a 3.2x wealth gain plus counter-intuitive institutional effects, but the numbers rest on one set of unvalidated agent rules.

read the letter

DIAGON is a new end-to-end market simulator populated by heterogeneous tool-using agents that can post jobs, bid, negotiate, execute with tools, pay, and track reputation. The main reported result is that market exchange produces 3.2 times the wealth of self-sufficient agents, yet identity transparency and stronger competitive selection reduce performance instead of improving it. The setup makes these institutional interventions directly testable in simulation, which is the useful part here. Having the full cycle observable and the code released lets others rerun or extend the experiments without starting from scratch. That is concrete progress for people thinking about the economic layer of agent platforms. The limitation is that the outcomes depend on the specific bidding, negotiation, and tool-selection rules baked into the agents. The abstract describes one market form and one set of agent behaviors without visible sensitivity sweeps or calibration to any observed traces. If the agents were more myopic, risk-averse, or used different heuristics, both the size of the wealth multiplier and the direction of the institutional effects could change. The stress-test note flags exactly this, and nothing in the provided abstract contradicts it. The work is therefore best read as an existence proof and a platform rather than a robust prediction. Researchers designing agent marketplaces or running mechanism experiments will find the tool and the specific numbers worth examining. It is not yet a settled result, but it is a clear starting point with enough structure to support further checks. I would send it to peer review so referees can examine the agent model details and ask for robustness tests.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DIAGON, a programmable simulation platform for modeling cognitive-labour markets among heterogeneous tool-using AI agents. By simulating one specific market instantiation, the authors report that market-based exchange yields 3.2 times the wealth accumulation of self-sufficient agents. They further demonstrate that these gains are highly sensitive to institutional design choices, with interventions such as identity transparency and intensified competitive selection sometimes reducing rather than enhancing market performance.

Significance. If the simulation results hold under more varied agent behaviors, this work would be significant for the emerging field of agent economics by providing a concrete, manipulable testbed and initial quantitative insights into how market institutions affect efficiency in AI-mediated transactions. The public release of code and data is a clear strength that supports reproducibility.

major comments (2)

[Agent Model and Simulation Setup] The 3.2× wealth multiplier and the directional effects of institutional interventions (e.g., identity transparency degrading performance) are generated by the specific bidding, negotiation, and tool-selection rules of the heterogeneous agents in DIAGON. No sensitivity sweeps or alternative policy specifications are reported, which is load-bearing for the central claim because different agent heuristics could alter both the magnitude of gains and the sign of institutional effects.
[Results and Discussion] The experimental results for one market form report the 3.2× figure and degradation under certain interventions without accompanying variance estimates, number of random seeds, or statistical controls. This limits assessment of whether the findings are robust to stochasticity in the simulation.

minor comments (2)

[Abstract] The GitHub link is provided but lacks a specific commit hash or release tag corresponding to the reported experiments.
[Introduction] Notation for key agent parameters (e.g., tool utility functions or reputation update rules) could be defined earlier to aid readers in following the simulation logic.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review. We respond to the major comments below, agreeing with the need for additional robustness checks and planning revisions accordingly.

read point-by-point responses

Referee: [Agent Model and Simulation Setup] The 3.2× wealth multiplier and the directional effects of institutional interventions (e.g., identity transparency degrading performance) are generated by the specific bidding, negotiation, and tool-selection rules of the heterogeneous agents in DIAGON. No sensitivity sweeps or alternative policy specifications are reported, which is load-bearing for the central claim because different agent heuristics could alter both the magnitude of gains and the sign of institutional effects.

Authors: We concur that the quantitative results, including the 3.2× wealth multiplier, are specific to the agent model and market rules implemented in this study. DIAGON is designed as a programmable platform, and the current work focuses on demonstrating its capabilities with a single, well-specified instantiation rather than a comprehensive parameter sweep. To strengthen the claims, we will add a new section in the revised manuscript presenting sensitivity analyses on key parameters such as bidding aggressiveness, negotiation protocols, and tool selection heuristics. These will include variations in agent heterogeneity to evaluate whether the performance gains and institutional effects persist. revision: yes
Referee: [Results and Discussion] The experimental results for one market form report the 3.2× figure and degradation under certain interventions without accompanying variance estimates, number of random seeds, or statistical controls. This limits assessment of whether the findings are robust to stochasticity in the simulation.

Authors: The referee is correct that variance estimates and details on random seeds are not provided in the current version. We will revise the Results section to include these: specifically, we will report results averaged over 20 independent random seeds, with standard errors, and perform basic statistical tests (e.g., t-tests) to confirm the significance of the wealth differences and intervention effects. This will be added to both the main text and supplementary materials. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct simulation outputs with no reduction to inputs by construction

full rationale

The paper reports empirical outcomes from executing the DIAGON simulation under one specific instantiation of heterogeneous agents whose bidding, negotiation, execution, and reputation rules are explicitly coded. The 3.2× wealth multiplier and directional effects of institutional interventions are generated by running the model forward; they are not obtained by fitting parameters to a data subset and then predicting a closely related quantity, nor by any self-referential definition or self-citation chain that collapses the claim back onto its own premises. The representativeness of the agent rules to future deployed systems is an external validity question, not a circularity issue. No load-bearing step in the reported derivation reduces to an algebraic identity or fitted input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on simulation outcomes whose internal agent decision models, heterogeneity parameters, and market clearing rules are not specified in the abstract; these constitute unexamined modeling assumptions.

pith-pipeline@v0.9.0 · 5483 in / 1053 out tokens · 59835 ms · 2026-05-10T18:04:29.397337+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

market exchange generates 3.2× the wealth of self-sufficient agents, but these gains depend strongly on institutional structure
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

interventions such as identity transparency and stronger competitive selection can degrade market performance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 15 canonical work pages · 1 internal anchor

[1]

J., Bethge, M., & Schulz, E

doi: 10.1038/s41562-025-02172-y. George A Akerlof. The market for “lemons”: Quality uncertainty and the market mechanism. InUncertainty in economics, pp. 235–251. Elsevier,

work page doi:10.1038/s41562-025-02172-y
[2]

Language models as agent models

Jacob Andreas. Language models as agent models. InFindings of the Association for Computa- tional Linguistics: EMNLP 2022,

2022
[3]

Robert Axelrod.The Evolution of Cooperation

Accessed 2025-03-31. Robert Axelrod.The Evolution of Cooperation. Basic Books, New York,

2025
[4]

Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K Hadfield, and Markus Anderljung

doi: 10.1257/aer.20190623. Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K Hadfield, and Markus Anderljung. Infrastructure for AI agents.Transactions on Machine Learning Research,

work page doi:10.1257/aer.20190623
[5]

Mechanism design for large language models

Paul Duetting, Vahab Mirrokni, Renato Paes Leme, Haifeng Xu, and Song Zuo. Mechanism design for large language models. InProceedings of the ACM Web Conference 2024, pp. 144–155,

2024
[6]

Modular pluralism: Pluralistic alignment via multi-LLM collaboration

Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular pluralism: Pluralistic alignment via multi-LLM collaboration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 4151–4171,

2024
[7]

Agam Goyal, Olivia Pal, Hari Sundaram, Eshwar Chandrasekharan, and Koustuv Saha

Accessed 2025-03-30. Agam Goyal, Olivia Pal, Hari Sundaram, Eshwar Chandrasekharan, and Koustuv Saha. Social simulacra in the wild: AI agent communities on Moltbook.arXiv preprint arXiv:2603.16128,

work page arXiv 2025
[8]

and Koh, Andrew , title =

Gillian K Hadfield and Andrew Koh. An economy of AI agents.arXiv preprint arXiv:2509.01063,

work page arXiv
[9]

arXiv preprint arXiv:2301.07543 , year=

John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus?arXiv preprint arXiv:2301.07543,

work page arXiv
[10]

Sayash Kapoor, Noam Kolt, and Seth Lazar

doi: 10.3982/ECTA19978. Sayash Kapoor, Noam Kolt, and Seth Lazar. Position: Build agent advocates, not platform agents. InInternational Conference on Machine Learning,

work page doi:10.3982/ecta19978
[11]

David M Kreps and Robert Wilson

doi: 10.1038/s41586-025-09505-x. David M Kreps and Robert Wilson. Reputation and imperfect information.Journal of Economic Theory, 27(2):253–279,

work page doi:10.1038/s41586-025-09505-x
[12]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, et al. Skillsbench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670,

work page internal anchor Pith review arXiv
[13]

Strategic collusion of LLM agents: Market division in multi-commodity competitions

Ryan Y Lin, Siddhartha Ojha, Kevin Cai, and Maxwell F Chen. Strategic collusion of LLM agents: Market division in multi-commodity competitions. InNeurIPS 2024 Workshop on Language Gamification,

2024
[14]

AgenticPay: A multi-agent LLM negotiation system for buyer–seller transactions.arXiv preprint arXiv:2602.06008,

Xianyang Liu, Shangding Gu, and Dawn Song. AgenticPay: A multi-agent LLM negotiation system for buyer–seller transactions.arXiv preprint arXiv:2602.06008,

work page arXiv
[15]

CoBRA: Programming cognitive bias in social agents using classic social science experiments.arXiv preprint arXiv:2509.13588, 2025a

Xuan Liu, Haoyang Shang, and Haojian Jin. CoBRA: Programming cognitive bias in social agents using classic social science experiments.arXiv preprint arXiv:2509.13588, 2025a. Xuan Liu, Jie Zhang, HaoYang Shang, Song Guo, Chengxu Yang, and Quanyan Zhu. Ex- ploring prosocial irrationality for LLM agents: A social cognition view. InThe Thirteenth Internationa...

work page arXiv
[16]

Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock

Accessed 2025-03-30. Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. Finding deceptive opinion spam by any stretch of the imagination. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 309–319,

2025
[17]

Strategic intelligence in large language models: Evidence from evolutionary game theory.arXiv preprint arXiv:2507.02618,

Kenneth Payne and Baptiste Alloui-Cros. Strategic intelligence in large language models: Evidence from evolutionary game theory.arXiv preprint arXiv:2507.02618,

work page arXiv
[18]

Alvin E Roth

doi: 10.1038/s42256-023-00646-0. Alvin E Roth. The economist as engineer: Game theory, experimentation, and computation as tools for design economics.Econometrica, 70(4):1341–1378,

work page doi:10.1038/s42256-023-00646-0
[19]

Tomasev, M

Nenad Tomasev, Matija Franklin, Joel Z Leibo, Julian Jacobs, William A Cunningham, Iason Gabriel, and Simon Osindero. Virtual agent economies.arXiv preprint arXiv:2509.10147,

work page arXiv
[20]

Advanc- ing ai negotiations: A large-scale autonomous negotiation competition.arXiv preprint arXiv:2503.06416, 2025

Michelle Vaccaro, Michael Caosun, Harang Ju, Sinan Aral, and Jared R Curhan. Advancing AI negotiations: New theory and evidence from a large-scale autonomous negotiation competition.arXiv preprint arXiv:2503.06416,

work page arXiv
[21]

Collaborating action by action: A multi-agent LLM framework for embodied reasoning.arXiv preprint arXiv:2504.17950,

Isadora White, Kolby Nottingham, Ayush Maniar, Max Robinson, Hansen Lillemark, Mehul Maheshwari, Lianhui Qin, and Prithviraj Ammanabrolu. Collaborating action by action: A multi-agent LLM framework for embodied reasoning.arXiv preprint arXiv:2504.17950,

work page arXiv
[22]

Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual tasks

Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Aky¨urek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, and Yoon Kim. Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual tasks. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1...

2024
[23]

Beyond numeric rewards: In-context dueling bandits with LLM agents

Fanzeng Xia, Hao Liu, Yisong Yue, and Tongxin Li. Beyond numeric rewards: In-context dueling bandits with LLM agents. InFindings of the Association for Computational Linguistics: ACL 2025,

2025
[24]

The effective reward becomes µ·R(τ j) and the effective execution cost becomes µ·c ex i , while backbone (thinking) cost remains unscaled. This ensures that execution cost dominates the agent’s budget— matching the economics of real outsourcing where the cost ofdoingthe work far exceeds the cost ofdecidingto do it (Williamson, 1985). Skill clusters.The se...

1985
[25]

and comparative advan- tage (Ricardo, 2005): an agent whose cluster si matches a task’s domain dj receives skill packages (documentation and helper scripts) that are injected into the Worker’s execution environment via prompt, improving execution quality. Skills therefore reside in thetask environment, not in the model itself; the Trader’s role is to sele...

2005
[26]

in his foundational study of cooperation: agents play repeated games, accumulate payoff, and the population is periodically updated so that successful strategies proliferate while unsuccessful ones are removed. In Axelrod’s original iterated prisoner’s dilemma tournaments, this mechanism produced the celebrated result thattit- for-tat—a simple reciprocity...

1998
[27]

where trust must be earned, not inherited. This mechanism yields 4% active-population turnover per cycle (1 /25), applied every 6 rounds—a high-frequency, low-amplitude approximation to the continuous replicator dynamic (Weibull, 1997), chosen to minimise per-event market disruption while maintaining meaningful selection pressure. Over a 100-round experim...

1997
[28]

−5% per successful match prevents persistent inflation

Trader budget $0.05/call Backbone cost cap Worker timeout 900 s Execution deadline Worker max turns 30 Tool-use iteration cap Table 2: Configuration parameters and their economic mechanisms. −5% per successful match prevents persistent inflation. Surge tasks are offered before fresh tasks each round (drain-first policy), ensuring that no contract is perma...

1979