arxiv: 2604.18602 · v2 · submitted 2026-04-09 · 💱 q-fin.TR · cs.CE

Recognition: unknown

Machine Spirits: Speculation and Adaptation of LLM Agents in Asset Markets

Cars Hommes, Fabio Caccioli, Marco Pangallo, Maxime Saxena, R. Maria del Rio-Chanona

Pith reviewed 2026-05-10 17:03 UTC · model grok-4.3

classification 💱 q-fin.TR cs.CE

keywords LLM agentsfinancial marketsspeculationmarket volatilityheterogeneous agentsAI adaptationbubblesrational expectations

0 comments

The pith

LLMs used as traders in asset markets can generate speculative bubbles and market instability through adaptation in heterogeneous groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models conform to rational expectations when trading in a simulated asset market or instead display speculative tendencies. Across 15 different models, behaviors range from stable coordination on fundamental prices to human-like bubble formation. In mixed populations, advanced models adapt their forecasts to the actions of others, allowing them to profit more but also driving up overall volatility and bubbles even when bubble-prone agents are a minority. These patterns suggest that introducing AI agents changes market ecology in ways that can produce endogenous instability.

Core claim

Heterogeneous groups of LLM agents in a simulated financial market generate variable outcomes across runs, with individual adaptation enabling exploitation of simpler agents and higher returns but also contributing to price bubbles and amplified volatility, contrary to expectations that more capable models would stabilize trading.

What carries the argument

The adaptation of LLM agents' forecasting strategies to the observed behavior of other agents in mixed populations.

If this is right

Heterogeneous LLM populations produce outcomes that vary substantially across repeated simulations.
Advanced models adapt to exploit less sophisticated ones and achieve higher profits.
This adaptation contributes to increased market volatility.
Bubbles form even with only a minority of naturally bubble-forming agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If these patterns hold in real markets, regulators may need to monitor AI trading systems for emergent instability.
Designing LLMs specifically to prioritize market stability over individual profit could reduce risks.
The findings point to potential challenges in scaling AI participation without coordinated rules of engagement.

Load-bearing premise

That the prompting and market simulation accurately represent how LLMs would behave when trading with actual capital at risk.

What would settle it

Observing whether real-world deployments of multiple LLM-based trading agents produce similar patterns of adaptation, exploitation, and volatility spikes in live asset markets.

Figures

Figures reproduced from arXiv: 2604.18602 by Cars Hommes, Fabio Caccioli, Marco Pangallo, Maxime Saxena, R. Maria del Rio-Chanona.

**Figure 1.** Figure 1: LLMs give a wide range of results, from speculative bubbles to rational behaviour. Human results are from Hommes et al. [2008]. A selection of the infinite rational expectations (RE) solutions are shown, with the constants corresponding to the “c” in the RE solution pt = p f + cRt . See Appendix A.1 for a derivation of the RE solutions. LLM results are split into experiments in which a bubble was formed an… view at source ↗

**Figure 2.** Figure 2: shows that in this mixed market, rather than one agent dominating and consistent outcomes being produced, the macro dynamics are varied across the 50 experimental repeats. Even the forms of the bubble and non-bubble runs are varied. To characterise this variation, we identify five different behaviours and classify each run (see Appendix B.7 for more details) [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Gemini exacerbates market movements while GPT-5 Mini dampens them. One representative experimental run per configuration is shown to illustrate the exacerbation and dampening effects seen in the summary statistics. A selection of quotes from each model is shown, demonstrating that Gemini-3-Flash builds a model of other participants’ prediction strategies while GPT-5 Mini averages more conservative approac… view at source ↗

**Figure 4.** Figure 4: Middling capability and aggressive price prediction are good predictors of bubble formation. Composite score refers to the capability index introduced in this paper. Average price is the mean price prediction from the extrapolation aggressiveness test. 37 [PITH_FULL_IMAGE:figures/full_fig_p037_4.png] view at source ↗

**Figure 5.** Figure 5: We find that experiments in which bubbles form have high proportions of non-linear extrapolation methods used and low anchoring to the fundamental value. This suggests that bubble formation is associated with high numbers of aggressive trend-followers and low numbers of fundamentalists. There is one outlier in the bottom right corner which appears to have high proportions of non-linear methods and low refe… view at source ↗

**Figure 5.** Figure 5: Bubble experiments have on average a high proportion of non-linear extrapolation methods used and a low proportion of price predictions anchored to the fundamental value. Proportions of predictions using non-linear extrapolation methods and anchoring to the fundamental value, averaged across experimental runs conditional on whether a bubble formed or not. For example, one point is the mean over Qwen3-32B r… view at source ↗

**Figure 6.** Figure 6: The mean squared errors are 3-4 orders of magnitude larger for models that form [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗

**Figure 6.** Figure 6: Agents make varied predictions, although they tend to coordinate to an extent on (boundedly rational) strategies during bubbles. Mean common and dispersion errors are plotted on a logarithmic scale for better visibility. Note that the bubble forming LLMs tend to have mean squared errors around 3-4 orders of magnitude bigger than LLMs that do not form bubbles. The dotted line divides models into those that … view at source ↗

read the original abstract

As Large Language Models (LLMs) become increasingly integrated into financial systems, understanding their behavioural properties is crucial. Do LLMs conform to the rational expectations paradigm, do they exhibit human-like "animal spirits", or do they instead manifest distinct "machine spirits"? We investigate these questions with a simulated financial market, exploring the behaviour of 15 LLMs spanning a range of sizes, capabilities, and providers. Our results show that LLMs exhibit a spectrum of economic behaviours, from stable coordination on the fundamental value to human-like speculative bubbles. These behaviours are generally inconsistent with the rational expectations hypothesis. We also consider an ecology of heterogeneous agents, a more realistic setting compared to markets with identical LLM agents. These mixed markets can produce outcomes which vary substantially across repeated simulations. Even the most advanced models fail to consistently stabilise the market, with price bubbles sometimes forming despite only a minority of agents naturally forming bubbles. Instead, advanced models in mixed markets adapt their forecasting strategies to the behaviour of other agents. This adaptation can allow them to successfully exploit less sophisticated counterparts and achieve higher profits, but can also contribute to increased market volatility. These findings suggest that the introduction of AI agents into financial markets fundamentally reshapes their ecology. In particular, heterogeneous populations of LLMs can generate endogenous instability, while individual-level adaptation may amplify, rather than mitigate, market volatility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript reports simulation experiments placing 15 LLMs (spanning sizes and providers) into an abstracted asset market. It finds that individual LLMs produce a range of behaviors inconsistent with rational expectations, including coordination on fundamentals and human-like speculative bubbles; heterogeneous populations generate variable outcomes across runs, with advanced models adapting to exploit others, sometimes increasing volatility. The central claim is that LLM agents can endogenously destabilize markets and that adaptation may amplify rather than dampen volatility.

Significance. If the reported patterns survive changes in prompting, market microstructure, and capital-at-risk constraints, the work would supply concrete empirical evidence that LLM heterogeneity introduces new instability channels in financial systems. The simulation design is reproducible in principle and avoids parameter fitting, which strengthens the internal validity of the observed adaptation and bubble-formation results.

major comments (3)

[Methods / Experimental Setup] Methods (prompting and market setup): The abstract and setup description provide no explicit prompting templates, temperature settings, context-window usage, or market parameters (initial price, dividend process, number of periods, trading rules). Without these, it is impossible to judge whether the reported bubble formation and adaptation are robust or sensitive to minor implementation choices; this directly affects the load-bearing claim that heterogeneous LLM populations generate endogenous instability.
[Results / Heterogeneous Markets] Results on adaptation and volatility: The finding that individual-level adaptation can amplify market volatility rests on comparisons across repeated simulations with fixed prompts. No statistical controls (e.g., regression of volatility on adaptation metrics, robustness to seed variation, or comparison against non-adaptive baselines) are described, leaving open whether the amplification effect is an artifact of the particular ecology or a general property.
[Discussion / Conclusion] External validity discussion: The extrapolation that LLM agents will 'fundamentally reshape' real financial ecology is unsupported by any mapping exercise showing that the simulated behaviors persist under slippage, position limits, regulatory constraints, or multi-period strategic feedback with actual capital. This gap is load-bearing for the policy-relevant conclusion.

minor comments (2)

[Abstract] The abstract contains an apparent truncation ('vo') and does not summarize the market model or statistical approach.
[Figures and Tables] Figure captions and table legends should explicitly state the number of simulation runs, random seeds, and exact LLM identifiers used for each panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for improving transparency, statistical rigor, and discussion of limitations. We address each major comment below and commit to revisions that strengthen the paper without altering its core findings.

read point-by-point responses

Referee: [Methods / Experimental Setup] Methods (prompting and market setup): The abstract and setup description provide no explicit prompting templates, temperature settings, context-window usage, or market parameters (initial price, dividend process, number of periods, trading rules). Without these, it is impossible to judge whether the reported bubble formation and adaptation are robust or sensitive to minor implementation choices; this directly affects the load-bearing claim that heterogeneous LLM populations generate endogenous instability.

Authors: We agree that full methodological details are essential for reproducibility and assessing robustness. The original submission prioritized the behavioral results over exhaustive implementation specifics, which was an oversight. In the revised manuscript, we will add a comprehensive Methods subsection that includes the exact prompting templates for all 15 LLMs, temperature settings (0.7 for the majority of models), context-window management protocols, and complete market parameters: initial price of 100, dividend process (constant fundamental value with stochastic shocks), 50 trading periods, and trading rules (limit-order book with no short sales in the baseline). These additions will directly enable evaluation of sensitivity to prompting and setup choices. revision: yes
Referee: [Results / Heterogeneous Markets] Results on adaptation and volatility: The finding that individual-level adaptation can amplify market volatility rests on comparisons across repeated simulations with fixed prompts. No statistical controls (e.g., regression of volatility on adaptation metrics, robustness to seed variation, or comparison against non-adaptive baselines) are described, leaving open whether the amplification effect is an artifact of the particular ecology or a general property.

Authors: The manuscript already documents substantial outcome variability through 10 independent repeated simulations per heterogeneous configuration, with figures illustrating divergent price paths and profit outcomes. We acknowledge the lack of formal statistical controls. In revision, we will add mean and standard deviation summaries for volatility metrics across runs, explicit comparisons of volatility under adaptive versus fixed-strategy (non-adaptive) baselines, and checks for robustness to random seeds. These enhancements will clarify that the observed amplification arises from adaptation dynamics rather than being an artifact of the specific agent ecology. revision: yes
Referee: [Discussion / Conclusion] External validity discussion: The extrapolation that LLM agents will 'fundamentally reshape' real financial ecology is unsupported by any mapping exercise showing that the simulated behaviors persist under slippage, position limits, regulatory constraints, or multi-period strategic feedback with actual capital. This gap is load-bearing for the policy-relevant conclusion.

Authors: The conclusion uses suggestive language ('these findings suggest') to highlight potential instability channels identified in simulation, rather than claiming direct real-world effects. We accept that the external-validity discussion requires expansion. The revised Discussion will explicitly address the absence of slippage, position limits, regulatory constraints, and real-capital feedback loops, framing the results as identifying plausible new risk mechanisms that merit further study in more realistic environments. We will also moderate the abstract and conclusion wording to avoid over-extrapolation while preserving the emphasis on endogenous instability in heterogeneous LLM populations. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical LLM simulation study

full rationale

The paper reports outcomes from running heterogeneous LLM agents in an abstracted asset market simulation. No mathematical derivations, fitted parameters, or first-principles claims appear that reduce reported results to input definitions by construction. Behaviors (bubbles, adaptation, volatility) emerge from prompt-driven agent interactions rather than from any self-referential fitting or renaming of known patterns. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to justify core results. The study is self-contained as an empirical exploration whose outputs are generated rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the work relies on standard assumptions of agent-based market simulations and LLM behavioral prompting.

pith-pipeline@v0.9.0 · 5555 in / 1028 out tokens · 39573 ms · 2026-05-10T17:03:43.470570+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 1 canonical work pages

[1]

Ariel Flint Ashery, Luca Maria Aiello, and Andrea Baronchelli

Accessed: 2026-03-05. Ariel Flint Ashery, Luca Maria Aiello, and Andrea Baronchelli. Emergent social conventions and collective bias in llm populations.Science Advances, 11(20):eadu9368, 2025. Jean-Philippe Bouchaud. The self-organized criticality paradigm in economics & finance. In Jenna Bednar, Eric Beinhocker, R. Maria del Rio-Chanona, J. Doyne Farmer,...

2026
[2]

reasoning

doi: 10.37911/eecs.2025.09. William A Brock and Cars H Hommes. Heterogeneous beliefs and routes to chaos in a simple asset pricing model.Journal of Economic dynamics and Control, 22(8-9):1235–1274, 1998. Philip Brookins and Jason Matthew DeBacker. Playing games with gpt: What can we learn about a large language model from canonical strategic games?Availab...

work page doi:10.37911/eecs.2025.09 2025
[3]

reasoning

and three lengths of time (3, 5, 7), giving 15 classifications. ForPmean bubble, we consider four thresholds (90, 120, 150, 180). For each experimental run, this gives us 19 classifications. We then compute Cohen’s kappa between each classification pair to measure agreement between the measures. Cohen’s kappa ranges from -1 to 1, with higher values signal...

2008