{"total":16,"items":[{"citing_arxiv_id":"2605.16895","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence","primary_cat":"cs.CE","submitted_at":"2026-05-16T09:14:35+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Reported alpha from end-to-end LLM trading agents does not constitute deployment evidence until it passes structural tests for temporal integrity, frictions, robustness, calibration, execution, and disaggregation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06822","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents","primary_cat":"cs.LG","submitted_at":"2026-05-07T18:23:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage points across equity sectors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06730","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics","primary_cat":"cs.LG","submitted_at":"2026-05-07T11:37:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01954","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading","primary_cat":"cs.AI","submitted_at":"2026-05-03T16:37:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Moira parameterizes hierarchical RL policies for pair trading with LLMs and adapts them via prompt updates based on trajectory and episode feedback, outperforming baselines on real market data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26747","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets","primary_cat":"q-fin.PM","submitted_at":"2026-04-29T14:46:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21433","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ChatGPT as a Time Capsule: The Limits of Price Discovery","primary_cat":"q-fin.GN","submitted_at":"2026-04-23T08:49:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Frozen LLM checkpoints serve as time capsules of public text and generate outlook scores that forecast equity returns and analyst actions beyond contemporaneous valuations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17327","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Signal or Noise in Multi-Agent LLM-based Stock Recommendations?","primary_cat":"q-fin.PM","submitted_at":"2026-04-19T08:43:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10996","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies","primary_cat":"cs.CL","submitted_at":"2026-04-13T04:53:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM features optimized for high information coefficient with returns do not reliably improve PPO trading policies under distribution shifts, where price-only or macro baselines remain more robust.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05211","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective","primary_cat":"q-fin.PR","submitted_at":"2026-04-10T17:36:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"This review synthesizes LLM uses in stock forecasting and catalogs key practical pitfalls from a hedge-fund viewpoint.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"infer investors' attitudes and evaluations toward a company or a financial market. It has been an active research topic for years [7], [8], [70]. Recently, with the emerging and innovation of LLMs, researchers have been actively devel- oping techniques to use LLMs to extract sentiment-related information from news articles, financial reports, or social media posts [43], [9], [11], [13], [14], [15], [16], [17], [18], [19], [22], [35], [36], [12]. The sentiment-related information can be used as features for downstream machine learning models in stock price prediction or portfolio construction, or used through zero- or few-shot prompting for stock move- ment prediction. Moreover, it can be used as supervision to fine-tune pretrained LLMs for task-specific market forecasts."},{"citing_arxiv_id":"2605.00844","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission","primary_cat":"cs.CY","submitted_at":"2026-04-07T19:45:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Three independent LLMs exhibit correlated forecasting errors on 568 binary questions but human predictions show no activation of this shared bias.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03888","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage","primary_cat":"cs.AI","submitted_at":"2026-04-04T22:51:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PolySwarm aggregates predictions from 50 LLM personas for Polymarket trading using Bayesian combination and divergence metrics, outperforming single models in calibration while adding latency arbitrage via CEX price models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"purpose BERT on financial sentiment benchmarks, establish- ing domain-specific fine-tuning as best practice for financial NLP. Subsequent work has extended this approach to larger architectures: GPT-based models applied to financial sentiment have demonstrated significant improvements on benchmarks including Financial PhraseBank, FiQA, and earnings call sen- timent classification [19]. Lopez-Lira and Tang [19] published an influential study demonstrating that ChatGPT-generated sentiment scores for news headlines exhibit predictive power for next-day stock re- turns that is substantially superior to lexicon-based approaches, survives standard risk-factor controls, and displays statistically significant alpha. This finding has catalyzed a wave of research"},{"citing_arxiv_id":"2604.02921","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Debiasing LLMs by Fine-tuning","primary_cat":"q-fin.GN","submitted_at":"2026-04-03T09:37:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.11512","ref_index":75,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Time Series Analysis to Question Answering: A Survey in the LLM Era","primary_cat":"cs.LG","submitted_at":"2025-06-13T07:13:05+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A survey proposing a taxonomy of Injective, Bridging, and Internal Alignment paradigms to evolve TSA into user-driven Time Series Question Answering with LLMs.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"itransformer: Inverted transformers are effective for time series forecasting. In ICLR, 2024. [73] Y . Liu, G. Qin, X. Huang, et al. Autotimes: Autoregressive time series forecasters via large language models. arXiv preprint, abs/2402.02370, 2024. [74] Z. Liu and R. Jia. Llm4fts: Enhancing large language models for financial time series prediction. arXiv preprint, abs/2505.02880, 2025. [75] A. Lopez-Lira and Y . Tang. Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint, abs/2304.07619, 2023. [76] Q. Ma, Z. Liu, Z. Zheng, Z. Huang, S. Zhu, Z. Yu, and J. T. Kwok. A survey on time-series pre-trained models. TKDE, 36(12):7536-7555, 2024. [77] Y . Nie, N. H. Nguyen, P. Sinthong, and J."},{"citing_arxiv_id":"2505.16120","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM-Powered AI Agent Systems and Their Applications in Industry","primary_cat":"cs.AI","submitted_at":"2025-05-22T01:52:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.22693","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging Language Models and Financial Analysis","primary_cat":"q-fin.ST","submitted_at":"2025-03-14T01:35:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.17011","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation","primary_cat":"q-fin.CP","submitted_at":"2025-02-24T09:46:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"CausalGAN + SAC RL pipeline generates synthetic bond yield data; fine-tuned Qwen2.5-7B LLM produces trading signals, with reported MAE 0.103, 60% profit rate, and LLM score 3.37/5.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}