Can LLM-based financial investing strategies outperform the market in long run?

Weixian Waylon Li, Hyeonjun Kim, Mihai Cucuringu, Tiejun Ma · 2025 · q-fin.TR · arXiv 2505.07078

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Large Language Models (LLMs) have recently been leveraged for asset pricing tasks and stock trading applications, enabling AI agents to generate investment decisions from unstructured financial data. However, most evaluations of LLM timing-based investing strategies are conducted on narrow timeframes and limited stock universes, overstating effectiveness due to survivorship and data-snooping biases. We critically assess their generalizability and robustness by proposing FINSABER, a backtesting framework evaluating timing-based strategies across longer periods and a larger universe of symbols. Systematic backtests over two decades and 100+ symbols reveal that previously reported LLM advantages deteriorate significantly under broader cross-section and over a longer-term evaluation. Our market regime analysis further demonstrates that LLM strategies are overly conservative in bull markets, underperforming passive benchmarks, and overly aggressive in bear markets, incurring heavy losses. These findings highlight the need to develop LLM strategies that are able to prioritise trend detection and regime-aware risk controls over mere scaling of framework complexity.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

CLQT: A Closed-Loop, Cost-Aware, Strategy-Consistent Benchmark for Diagnostic Evaluation of LLM Portfolio-Management Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.

SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.

Fund2Persona: A Framework for Building and Refining Financial Advisor Personas from Fund Disclosure Data

cs.CL · 2026-06-29 · unverdicted · novelty 5.0

Fund2Persona grounds and refines financial advisor personas in fund disclosures, holdings transitions, and manager commentary, showing improved performance on reconstruction and alignment tasks over generic baselines.

SoK: Security of Autonomous LLM Agents in Agentic Commerce

cs.CR · 2026-04-15 · unverdicted · novelty 5.0

The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Can LLM-based financial investing strategies outperform the market in long run?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer