ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination
Pith reviewed 2026-05-21 20:20 UTC · model grok-4.3
The pith
The ATLAS framework enables LLM trading agents to improve performance over time by dynamically optimizing prompts with stochastic market feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within ATLAS the central trading agent works in an order-aware action space to produce executable market orders and applies Adaptive-OPRO to incorporate real-time stochastic feedback into its prompt, producing increasing performance over time that outperforms fixed prompts across regime-specific equity studies and multiple LLM families while reflection-based feedback yields no systematic gains.
What carries the argument
Adaptive-OPRO, a prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback from trading outcomes.
If this is right
- The trading agent generates outputs that map directly to executable market orders rather than abstract signals.
- Multiple agents synthesize market data, news, and corporate fundamentals into coherent trading decisions.
- Performance improves measurably as the agent continues to trade and receive feedback.
- These advantages appear across different market regimes and several large language model families.
Where Pith is reading between the lines
- The same feedback-driven prompt updates could be tested in other sequential decision settings that feature delayed and noisy rewards.
- Combining Adaptive-OPRO with additional coordination rules among agents might increase stability during sudden market shifts.
- Live-market deployment would reveal whether the measured gains survive transaction costs and execution delays not present in the controlled studies.
Load-bearing premise
That late and noisy market feedback can be incorporated into prompt updates in a stable way that produces measurable performance gains without introducing instability or overfitting to specific regimes.
What would settle it
A new set of regime-specific equity trading tests in which Adaptive-OPRO produces no consistent outperformance relative to fixed prompts across additional LLM families would show the central claim does not hold.
Figures
read the original abstract
Large language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents ATLAS, a multi-agent framework for deploying large language models as trading agents. It integrates structured data from markets, news, and corporate fundamentals; employs an order-aware action space to produce executable market orders; and introduces Adaptive-OPRO, a dynamic prompt-optimization technique that incorporates real-time stochastic feedback to adapt instructions during trading. The central empirical claim is that Adaptive-OPRO yields consistently increasing performance and outperforms fixed prompts across regime-specific equity studies and multiple LLM families, whereas reflection-based feedback does not deliver systematic gains.
Significance. If the reported performance improvements prove robust under proper controls, the work would offer a practical contribution to LLM-driven quantitative trading by tackling adaptation to delayed, noisy market rewards. The order-aware action space and multi-agent coordination address deployment gaps between model outputs and executable trades. The finding that reflection-based methods fail systematically is a useful negative result for the field.
major comments (3)
- [Abstract] Abstract: The performance claims for Adaptive-OPRO are stated without any quantitative metrics, error bars, dataset descriptions, number of trials, or ablation results, so the central claim that it 'consistently outperforms fixed prompts' cannot be evaluated from the text.
- [§3–4] §3–4: The description of Adaptive-OPRO does not specify regularization, variance-reduction steps, or anti-overfitting mechanisms for incorporating late, market-noise-obscured feedback into prompt updates. Without these, measured gains in non-stationary equity regimes may reflect transient regime fitting rather than stable adaptation.
- [§5] §5 (regime-specific studies): The claim that reflection-based feedback fails systematically is presented as supporting evidence for Adaptive-OPRO, yet the manuscript does not detail how regimes are identified, how performance is aggregated across them, or whether the same noise issues affect both methods equally.
minor comments (2)
- [§3] Clarify the precise definition of 'real-time' feedback given the acknowledged latency of market rewards.
- [Figures/Tables] Add confidence intervals or standard errors to any performance curves or tables comparing Adaptive-OPRO against baselines.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive review of our manuscript on ATLAS. We address each of the major comments in detail below, indicating where revisions will be made to enhance the clarity, rigor, and completeness of the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The performance claims for Adaptive-OPRO are stated without any quantitative metrics, error bars, dataset descriptions, number of trials, or ablation results, so the central claim that it 'consistently outperforms fixed prompts' cannot be evaluated from the text.
Authors: We agree that the abstract would benefit from including key quantitative results to support the performance claims. In the revised version, we will incorporate specific metrics such as the average outperformance in returns and Sharpe ratio, along with the number of trials and a high-level description of the datasets used in the regime-specific studies. This will make the central claims more evaluable directly from the abstract. revision: yes
-
Referee: [§3–4] §3–4: The description of Adaptive-OPRO does not specify regularization, variance-reduction steps, or anti-overfitting mechanisms for incorporating late, market-noise-obscured feedback into prompt updates. Without these, measured gains in non-stationary equity regimes may reflect transient regime fitting rather than stable adaptation.
Authors: This is a valid concern regarding the robustness of Adaptive-OPRO. The current manuscript describes the core stochastic feedback loop but does not explicitly outline regularization or anti-overfitting procedures. We will revise Sections 3 and 4 to include a detailed explanation of the variance reduction achieved through multi-episode stochastic sampling and introduce a regularization term in the prompt optimization objective to mitigate overfitting to noisy market signals. We will also add ablation experiments demonstrating the impact of these mechanisms on performance stability across regimes. revision: yes
-
Referee: [§5] §5 (regime-specific studies): The claim that reflection-based feedback fails systematically is presented as supporting evidence for Adaptive-OPRO, yet the manuscript does not detail how regimes are identified, how performance is aggregated across them, or whether the same noise issues affect both methods equally.
Authors: We appreciate this point on the need for greater transparency in the experimental setup. Section 5 currently presents the results but can be expanded for clarity. In the revision, we will add a subsection detailing the regime identification process (based on statistical properties of the time series), the aggregation method for performance metrics across regimes, and a comparative analysis confirming that both feedback methods encounter equivalent market noise levels, with only Adaptive-OPRO showing systematic adaptation gains. This will strengthen the interpretation of the negative result for reflection-based methods. revision: yes
Circularity Check
No circularity detected; empirical framework is self-contained
full rationale
The paper describes an empirical multi-agent trading framework (ATLAS) and a prompt-optimization method (Adaptive-OPRO) that incorporates real-time stochastic feedback. No mathematical derivation chain, self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described claims. Performance comparisons (Adaptive-OPRO vs. fixed prompts) are presented as outcomes of regime-specific equity studies across LLM families, without equations or definitions that reduce the central result to its own inputs by construction. The work is therefore treated as self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Adaptive-OPRO updates the prompt of the Central Trading Agent using realized outcomes... windowed scoring... s = clip[0,100](50 + 250·ROI)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents
SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage poin...
-
Signal or Noise in Multi-Agent LLM-based Stock Recommendations?
A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.
Reference graph
Works this paper leans on
-
[1]
simulation environment and integrated into our analysis framework to provide comprehensive market insights. Data source.The Market Analyst consumes OHLCV , volume, and session VWAP series from Polygon.io2 for the specified instrument and eval- uation window. Bars are retrieved at daily resolu- tion and aligned to official U.S. market sessions, with corpor...
work page 1999
-
[2]
No Market Analyst: removes multi-timescale technical structure and indicators
-
[3]
No News Analyst: removes unstructured text processing of headlines and stories
-
[4]
No Market & No News: leaves only portfolio state and fundamentals. We do not ablate theFundamental Analystdue to its intentionally low activation frequency within these windows; its role is assessed qualitatively around reporting events. Each ablation is run three times. D.6 Evaluation Methodology We use amulti-run protocolof three independent runs per co...
work page 1996
-
[5]
captures momentum shifts by computing the difference between the 12-day and 26-day exponential moving averages. A 9-day EMA of the MACD line is used as a signal line. Trading signals are generated when the MACD line crosses the signal line from below (buy) or from above (sell). The exponential formulation ensures increased sensitivity to recent price move...
work page 2023
-
[6]
**Market Structure:** Current trend context and notable support/resistance observations
-
[7]
**Price Action:** What the current session dynamics are showing
-
[8]
**Technical Patterns:** Observable confluences and technical formations
-
[9]
**Notable Levels:** Key price levels and their technical significance 36 37**Available Technical Tools:** 38- Standard indicators: Moving averages, RSI, MACD, ATR, volume analysis 39- Advanced levels: Fibonacci retracements/extensions, pivot points, psychological levels 40- Pattern recognition: Chart patterns, candlestick formations, breakout setups 41- V...
-
[10]
**Sentiment Assessment:** What’s the overall sentiment trajectory and key narrative changes?
-
[11]
**Key Developments:** What significant events or announcements are reported?
-
[12]
**Market Relevance:** How might this news content relate to market conditions?
-
[13]
**Source Analysis:** Any source reliability concerns or consensus alignment issues? 32 33**Response Format:** 34- Write in simple, direct language without jargon overuse 35- Each section should be 2-3 concise sentences maximum 36- Avoid repetitive phrasing and redundant explanations 37- No excessive formatting, bold text, or bullet point lists 38- Focus o...
-
[14]
**Setup (this message)** - Complete framework, methodology and initial fundamentals batch
-
[15]
**Delta updates** - Compact {{ action_interval }} updates with updated fundamentals 9 10**CRITICAL:** Future deltas contain NO repeated instructions. 11All analytical frameworks must persist. 12 13You are an elite fundamental analyst with deep expertise in financial statement analysis and corporate finance. 14Your reputation is built on the ability 15to q...
-
[16]
Introduced a 5-step THINK→CHECK→ACT workflow that linearly converts market inputs into compliant orders, minimizing reasoning omissions
-
[17]
Added an explicit PRE-ORDER RISK CHECKLIST (cash, short limit, catalyst validity, ≥ 2:1 R:R) to curb rule violations and low-edge trades
-
[18]
Elevated the four context feeds (technical, news, fundamentals, reflection) into a singleMARKET SITUATION dashboard that the workflow must reference, ensuring holistic analysis
-
[19]
Moved the strict JSON schema into its own boxed section immediately before output in- structions; this reduces formatting errors
-
[20]
Kept language concise but directive, reinforcing trader autonomy while preventing over-trading with aPATIENCEoverride
-
[21]
-” and lines in green with a leading “+
Preserved every required {{placeholder}} and {% if %} block exactly, guaranteeing template compatibility. Each modification directly corresponds to a specific weakness identified in the diagnostic phase, creating a clear causal chain from prob- lem identification to solution implementation. The architectural changes shown in Figures 4, 5, and 6 demonstrat...
work page 2025
-
[22]
Define thesis & edge
-
[23]
Identify entry, stop, and target levels
-
[24]
Assess risk/reward & size within cash limits
-
[25]
Choose order type & execution timing
-
[26]
Verify constraints & finalize plan ## CONSTRAINTS & PORTFOLIO - Fully concentrated in {{ instrument }}, Cash ${{ portfolio_cash }} - Long {{ shares_long }} | Short {{ shares_short }} | Net {{ shares_net }} - Recent orders: {{ executed_orders }} - Max short = 100% cash; close all shorts by {{ window_end }} - Actions: BUY, SELL, SHORT, SHORT_COVER - Order T...
-
[27]
Define Thesis & Edge: state your core conviction
-
[28]
Map Key Levels: identify entry, stop-loss, and target levels
-
[29]
Assess Risk/Reward: compute per-share risk, total risk, and reward potential
-
[30]
Allocate Size: determine quantity within cash limits (${{ portfolio_cash }})
-
[31]
Choose Execution: select action (BUY | SELL | SHORT | SHORT_COVER) and orderType (MARKET | LIMIT | STOP)
-
[32]
Validate Compliance: ensure all constraints are met before submission. ## OUTPUT SPECIFICATION Return only a JSON array of orders or an empty array ([]). No extra text: [ { "action": "BUY | SELL | SHORT | SHORT_COVER", "orderType": "MARKET | LIMIT | STOP", "price": float | null, "quantity": integer, "explanation": "Concise strategic reasoning" } ] Figure ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.