Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.
hub
Can chatgpt forecast stock price movements? return pre- dictability and large language models
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.
Adding a Bayesian source memory for market-feedback adaptive retrieval to a frozen LLM improves macro-F1 from 0.438 to 0.471 and portfolio Sharpe from 0.52 to 0.84 in point-in-time financial event-impact prediction.
SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage points across equity sectors.
Frozen LLM checkpoints serve as time capsules of public text and generate outlook scores that forecast equity returns and analyst actions beyond contemporaneous valuations.
A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.
PolySwarm aggregates predictions from 50 LLM personas for Polymarket trading using Bayesian combination and divergence metrics, outperforming single models in calibration while adding latency arbitrage via CEX price models.
Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.
A survey proposing a taxonomy of Injective, Bridging, and Internal Alignment paradigms to evolve TSA into user-driven Time Series Question Answering with LLMs.
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.
LLM features optimized for high information coefficient with returns do not reliably improve PPO trading policies under distribution shifts, where price-only or macro baselines remain more robust.
Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.
This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.
CausalGAN + SAC RL pipeline generates synthetic bond yield data; fine-tuned Qwen2.5-7B LLM produces trading signals, with reported MAE 0.103, 60% profit rate, and LLM score 3.37/5.
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
citing papers explorer
-
Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading
Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.
-
From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets
Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.
-
Point-in-Time Financial RAG with Frozen LLMs and Market-Feedback Adaptive Retrieval
Adding a Bayesian source memory for market-feedback adaptive retrieval to a frozen LLM improves macro-F1 from 0.438 to 0.471 and portfolio Sharpe from 0.52 to 0.84 in point-in-time financial event-impact prediction.
-
SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents
SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage points across equity sectors.
-
ChatGPT as a Time Capsule: The Limits of Price Discovery
Frozen LLM checkpoints serve as time capsules of public text and generate outlook scores that forecast equity returns and analyst actions beyond contemporaneous valuations.
-
Signal or Noise in Multi-Agent LLM-based Stock Recommendations?
A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.
-
PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage
PolySwarm aggregates predictions from 50 LLM personas for Polymarket trading using Bayesian combination and divergence metrics, outperforming single models in calibration while adding latency arbitrage via CEX price models.
-
Debiasing LLMs by Fine-tuning
Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.
-
From Time Series Analysis to Question Answering: A Survey in the LLM Era
A survey proposing a taxonomy of Injective, Bridging, and Internal Alignment paradigms to evolve TSA into user-driven Time Series Question Answering with LLMs.
-
The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence
Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.
-
Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics
SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.
-
When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies
LLM features optimized for high information coefficient with returns do not reliably improve PPO trading policies under distribution shifts, where price-only or macro baselines remain more robust.
-
The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission
Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.
-
A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective
This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.
-
Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation
CausalGAN + SAC RL pipeline generates synthetic bond yield data; fine-tuned Qwen2.5-7B LLM produces trading signals, with reported MAE 0.103, 60% profit rate, and LLM score 3.37/5.
-
LLM-Powered AI Agent Systems and Their Applications in Industry
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
-
Bridging Language Models and Financial Analysis
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.