Recognition: unknown
Dissecting AI Trading: Behavioral Finance and Market Bubbles
Pith reviewed 2026-05-10 03:12 UTC · model grok-4.3
The pith
Targeted prompt changes can increase or decrease the size of bubbles created by AI trading agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In an open-call auction populated by autonomous LLM agents, classic behavioral patterns appear at the individual level and aggregate into equilibrium dynamics that match prior human experimental markets. These include the predictive role of excess demand for future prices and the positive link between disagreement and trading volume. Scoring the agents' reasoning text with a twenty-mechanism framework shows that prompt interventions can amplify or suppress chosen mechanisms, which in turn alters bubble magnitude.
What carries the argument
The twenty-mechanism scoring framework that reads agents' reasoning text to detect behavioral drivers and enables their causal manipulation through targeted prompt edits.
If this is right
- AI agents produce the same individual biases and aggregate market dynamics observed in human experimental asset markets.
- Excess demand generated by AI agents forecasts subsequent price changes.
- Greater disagreement among AI agents leads to higher trading volume.
- Prompt edits that target specific behavioral mechanisms can raise or lower overall bubble size.
Where Pith is reading between the lines
- The same scoring and intervention method could be used to test which mechanisms drive other market patterns beyond bubbles.
- AI trading systems might be tuned at the prompt level to limit contribution to price instability in controlled settings.
- Replication of classic human results with LLM agents opens the possibility of using such agents for rapid, low-cost testing of regulatory interventions.
Load-bearing premise
The twenty-mechanism scoring framework applied to agents' reasoning text accurately identifies and allows causal manipulation of the behavioral mechanisms driving trading decisions.
What would settle it
If the market is rerun with prompt interventions that are designed to raise or lower specific mechanism scores, yet the measured bubble size stays unchanged, the claimed causal link between those mechanisms and bubble magnitude would be falsified.
Figures
read the original abstract
We study how AI agents form expectations and trade in experimental asset markets. Using a simulated open-call auction populated by autonomous Large Language Model (LLM) agents, we document three main findings. First, AI agents exhibit classic behavioral patterns: a pronounced disposition effect and recency-weighted extrapolative beliefs. Second, these individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices and the positive relationship between disagreement and trading volume. Third, by analyzing the agents' reasoning text through a twenty-mechanism scoring framework, we show that targeted prompt interventions causally amplify or suppress specific behavioral mechanisms, significantly altering the magnitude of market bubbles.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper simulates experimental asset markets populated by autonomous LLM agents in an open-call auction setting. It reports three findings: AI agents display classic behavioral biases including a disposition effect and recency-weighted extrapolative expectations; these individual patterns aggregate into market-level dynamics that replicate key results from Smith et al. (1988), such as excess demand predicting future prices and disagreement driving trading volume; and a twenty-mechanism scoring framework applied to agents' reasoning text permits targeted prompt interventions that amplify or suppress specific mechanisms, thereby altering the magnitude of market bubbles.
Significance. If the central claims hold, the work provides a promising bridge between behavioral finance and AI-driven simulations by offering a method to dissect and causally manipulate individual-level mechanisms in aggregate market outcomes. The replication of established experimental patterns lends credibility to the simulation environment, and the intervention results could inform both theoretical models of bubbles and practical approaches to AI trading alignment. The absence of free parameters in the core simulation setup is a strength.
major comments (1)
- [Abstract (third finding) and associated methods description] The third finding, which is load-bearing for the paper's novel contribution, depends entirely on the validity of the twenty-mechanism scoring framework. The abstract states that this framework is used to analyze reasoning text and enable targeted prompt interventions, but provides no information on rubric construction, inter-rater reliability, validation against actual trading outcomes, or tests confirming that interventions affect only the intended mechanism(s) without diffuse effects on others. If mechanisms overlap or if LLM-based scoring introduces systematic biases, the observed changes in bubble magnitude cannot be causally attributed to specific behavioral channels.
minor comments (1)
- [Abstract] The abstract would benefit from brief specification of key simulation parameters (e.g., number of agents, asset supply, dividend process) to allow readers to assess the setup's fidelity to Smith et al. (1988).
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that additional methodological transparency is required for the twenty-mechanism scoring framework to support the causal claims in the third finding, and we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract (third finding) and associated methods description] The third finding, which is load-bearing for the paper's novel contribution, depends entirely on the validity of the twenty-mechanism scoring framework. The abstract states that this framework is used to analyze reasoning text and enable targeted prompt interventions, but provides no information on rubric construction, inter-rater reliability, validation against actual trading outcomes, or tests confirming that interventions affect only the intended mechanism(s) without diffuse effects on others. If mechanisms overlap or if LLM-based scoring introduces systematic biases, the observed changes in bubble magnitude cannot be causally attributed to specific behavioral channels.
Authors: We acknowledge that the abstract is brief by design and does not detail these aspects. The full manuscript describes the framework in the Methods section, but we agree more explicit information is needed. In revision we will expand the Methods to report: (1) rubric construction, derived from the behavioral finance literature on disposition effect and extrapolative expectations; (2) inter-rater reliability statistics obtained from repeated LLM scoring and spot-checked human coding; (3) validation results showing correlations between mechanism scores and observed trading actions (e.g., disposition scores predicting premature sales); and (4) specificity tests, including control interventions and mechanism-ablation runs, confirming that prompt changes affect primarily the targeted channel. These additions will allow readers to evaluate potential overlap or scoring bias directly. The core empirical patterns and intervention effects remain unchanged. revision: yes
Circularity Check
No circularity: empirical simulation results rest on direct agent outputs and external benchmarks
full rationale
The paper reports three findings from LLM-agent simulations in experimental asset markets: individual behavioral patterns, aggregation into classic market dynamics (citing Smith et al. 1988 as external replication target), and causal effects of prompt interventions on a twenty-mechanism scoring of reasoning text. No equations, fitted parameters, or self-referential definitions appear in the provided abstract or description. The scoring framework is applied to observed text to enable interventions; the resulting bubble-magnitude shifts are measured outcomes, not inputs redefined as predictions. No self-citation chain, ansatz smuggling, or renaming of known results is indicated. The work is self-contained empirical simulation against observable agent behavior and prior experimental benchmarks, with no reduction of claims to quantities defined by the framework itself.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can be prompted to produce trading decisions and reasoning text that exhibit measurable behavioral biases such as disposition effect and extrapolative expectations.
invented entities (1)
-
twenty-mechanism scoring framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Read Phase:At the start of period t, the agent’s prompt includes the exact text they generated in thememory_updatefield during periodt−1
-
[2]
Reflection Task
Practice Reflection:After the 3 initial practice periods (Periods -2, -1, 0), the agents are prompted with a special "Reflection Task". They are asked:"You have completed 3 practice rounds. Review your trading performance. Did you buy above the fundamental value? Did you 46 sell below it? Write a comprehensive set of rules for yourself in your INSIGHTS.tx...
-
[3]
Invalid JSON format. You have lost your turn for this period. Please output strictly valid JSON
Write Phase:The output generated inupdate_plans_txt and update_insights_txt over- writes the previous files and is carried forward to periodt+ 1. A.5 A.5. Rule Enforcement and Budget Constraints To prevent the LLMs from taking impossible actions, the simulation engine enforces strict budget constraints before passing orders to the market-clearing algorith...
-
[4]
Select best-fitting label
-
[5]
Cite exact evidence (use [] if none)
-
[6]
Provide confidence (0-1) and numeric_score
-
[7]
Add brief notes ($\leq$25 words) only if needed === MECHANISMS ===
-
[8]
rational_speculative_bubble Labels: aware_of_resale_logic | ignores_resale_logic | unclear Score: 1.0=explicit resale expectations, 0.5=hinted, 0=none Definition: Agent expects to resell asset at higher price to future buyers (greater fool theory)
-
[9]
synchronization_risk Labels: synchronization_risk_acknowledged | rides_bubble | no_coordination_reference Score: 1.0=explicit timing concerns, 0.5=implied, 0=none Definition: Agent delays action due to coordination problem—uncertain when others will exit
-
[10]
asymmetric_information Labels: claims_private_info | acknowledges_info_disadvantage | no_info_asymmetry_mention Score: 1.0=claims advantage, 0.5=acknowledges disadvantage, 0=none Definition: Agent believes they possess superior information relative to other market participants. 48
-
[11]
anchoring to fundamental value
extrapolation_vs_anchor Labels: pure_extrapolation | recognizes_overvaluation | fundamental_anchor | unobserved Score: 1.0=fundamental anchor, 0.5=recognizes overvaluation, 0=pure extrapolation Definition: Agent forecasts by extrapolating past trends vs. anchoring to fundamental value
-
[12]
diagnostic_expectations Labels: overweights_recent_signals | balanced_weighting | underweights_recent | unobserved Score: 1.0=heavy overweighting, 0.5=moderate, 0=balanced Definition: Agent overweights recent salient signals, exhibiting overreaction
-
[13]
wavering_behavior Labels: flip_flopping | consistent_bullish | consistent_bearish | value_focused | unobserved Score: 1.0=switches between growth/value, 0.5=shows tension, 0=consistent Definition: Agent alternates between growth/momentum signals (greed) and value signals (fear)
-
[14]
disposition_effect Labels: holds_losers_sells_winners | profit_locking_tendency | loss_averse_holding | rational_profit_taking | no_evidence Score: 1.0=clear disposition pattern, 0.5=profit-locking, 0=rational Definition: Tendency to sell winners too early and hold losers too long
-
[15]
monitors fundamental news (newswatcher)
momentum_vs_newswatcher Labels: momentum | newswatcher | hybrid | unobserved Score: 1.0=momentum, 0.5=hybrid, 0=newswatcher Definition: Agent follows price trends (momentum) vs. monitors fundamental news (newswatcher)
-
[16]
feedback_trading Labels: pure_trend_following | contrarian | fundamental_based | unobserved Score: 1.0=pure trend without fundamentals, 0.5=partial, 0=fundamental-based Definition: Trading based purely on past price changes without fundamental justification. 49
-
[17]
overconfidence Labels: overconfident | well_calibrated | underconfident | unobserved Score: 1.0=excessive certainty, 0.5=moderate, 0=well-calibrated Definition: Excessive certainty about one’s own judgments, predictions, or trading abilities
-
[18]
self_attribution_bias Labels: attributes_wins_to_skill | balanced_attribution | attributes_losses_externally | unobserved Score: 1.0=asymmetric attribution, 0.5=partial, 0=balanced Definition: Agent attributes successes to own skill but blames failures on external factors
-
[19]
herding_contagion Labels: explicit_herding | fear_missing_out | contrarian | independent | unobserved Score: 1.0=explicit herding or FOMO, 0.5=implicit, 0=independent Definition: Agent follows crowd behavior or exhibits fear of missing out (FOMO)
-
[20]
disagreement_heterogeneous_beliefs Labels: acknowledges_disagreement | assumes_consensus | unobserved Score: 1.0=recognizes disagreement, 0.5=implicit, 0=assumes consensus Definition: Agent recognizes that market participants hold different views about fundamental value
-
[21]
representativeness_heuristic Labels: pattern_matching_past_bubbles | historical_analogy | no_historical_reference | unobserved Score: 1.0=explicit past bubble match, 0.5=general analogy, 0=none Definition: Agent matches current situation to past patterns or bubbles
-
[22]
this time is different
new_era_thinking Labels: this_time_different | paradigm_shift_claim | acknowledges_similarity | unobserved Score: 1.0=claims new paradigm, 0.5=hints uniqueness, 0=acknowledges patterns Definition: Agent believes "this time is different"—current situation is structurally unique. 50
-
[23]
availability_bias Labels: overweights_salient_events | balanced_memory | unobserved Score: 1.0=focuses on salient events, 0.5=partial, 0=balanced Definition: Agent overweights easily recalled vivid events while ignoring base rates
-
[24]
limited_arbitrage_awareness Labels: acknowledges_arbitrage_limits | assumes_unlimited_arbitrage | unobserved Score: 1.0=explicit limits mention, 0.5=implicit, 0=assumes no limits Definition: Agent recognizes that arbitrage has limits (fundamental risk, capital constraints)
-
[25]
loss_aversion Labels: loss_averse | risk_neutral | risk_seeking | unobserved Score: 1.0=clear asymmetric sensitivity, 0.5=moderate, 0=symmetric Definition: Agent shows asymmetric sensitivity to losses vs gains (losses loom larger)
-
[26]
narrative_tone Labels: amplifying | cautionary | neutral Score: 0-1 scaled by emotive language intensity Definition: Agent interprets and propagates narrative tone—amplifying (exuberant) vs cautionary (fearful) vs neutral
-
[27]
mechanism_assessments
statistical_testing Labels: formal_test | heuristic_threshold | no_test Score: 1.0=formal test, 0.5=heuristic, 0=none Definition: Agent references formal tests or heuristic thresholds for bubble detection. === OUTPUT FORMAT === Return valid JSON: { "mechanism_assessments": [ 51 { "mechanism_id": "disposition_effect", "mechanism_category": "trading_biases"...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.