arxiv: 2605.05211 · v1 · submitted 2026-04-10 · 💱 q-fin.PR · cs.AI· cs.LG· q-fin.ST

Recognition: unknown

A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective

Olivia Zhang, Zhilin Zhang

Pith reviewed 2026-05-10 17:01 UTC · model grok-4.3

classification 💱 q-fin.PR cs.AIcs.LGq-fin.ST

keywords large language modelsstock price forecastinghedge fundssentiment analysisfinancial text analysismulti-agent systemsdata leakagemarket predictability

0 comments

The pith

Large language models show promise for stock price forecasting through text analysis and sequence modeling but require careful handling of practical pitfalls for hedge fund deployment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review gathers recent work on applying large language models to predict stock prices, covering sentiment extraction from news and social media, analysis of reports and earnings calls, tokenization of price series, and multi-agent trading setups. It draws special attention to practical issues that studies often downplay, including unstable sentiment signals, flawed choices of datasets and time horizons, weak evaluation metrics, risks of data leakage, effects from illiquid stocks, and fundamental limits on price predictability. Written from a hedge fund viewpoint, the synthesis aims to help researchers and managers build and test LLM systems that hold up under real trading conditions and market frictions.

Core claim

The review synthesizes applications of large language models in stock price forecasting, including sentiment extraction from financial news and social media, analysis of financial reports and earnings-call transcripts, tokenizing or symbolizing stock price series, and constructing multi-agent trading systems, while highlighting practical pitfalls such as fragility in sentiment analysis, dataset and horizon design, performance evaluation metrics, data leakage, illiquidity premia, and limits of stock price predictability.

What carries the argument

A hedge-fund-oriented synthesis that pairs descriptions of LLM applications in finance with an explicit list of deployment pitfalls to stress-test robustness under market conditions.

If this is right

Academic studies of LLMs in finance should incorporate explicit checks for data leakage and realistic out-of-sample horizons.
Hedge funds testing LLM systems must adjust performance metrics for illiquidity premia and market impact.
Multi-agent LLM trading setups require validation against documented limits on price predictability.
Sentiment-based signals from LLMs need additional robustness testing before live use.
Evaluation of LLM forecasts should prioritize metrics that reflect real trading frictions over standard accuracy scores.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the pitfalls prove as load-bearing as described, many existing LLM finance papers may overstate practical performance.
The review's structure could be adapted to examine LLM applications in adjacent areas such as options pricing or risk management.
Practitioners might begin with simpler statistical models and add LLMs only after demonstrating clear gains that survive the identified hurdles.
Over-reliance on LLM outputs without addressing leakage and predictability limits could increase model risk in portfolio construction.

Load-bearing premise

The review assumes the cited studies are representative of current LLM use in stock forecasting and that the listed pitfalls are the main obstacles for hedge fund success.

What would settle it

A controlled live-trading experiment or independent replication that applies LLM methods while avoiding the listed pitfalls and still produces consistent positive returns net of costs would undermine the claim that these issues are central barriers.

read the original abstract

Large language models (LLMs) are increasingly deployed in quantitative finance for stock price forecasting. This review synthesizes recent applications of LLMs in this domain, including extracting sentiment from financial news and social media, analyzing financial reports and earnings-call transcripts, tokenizing or symbolizing stock price series, and constructing multi-agent trading systems. Particular attention is paid to practical pitfalls that are often understated in the literature, such as fragility in sentiment analysis, dataset and horizon design, performance evaluation metrics, data leakage, illiquidity premia, and limits of stock price predictability. Organized from a hedge-fund perspective, the review is intended to guide both academic researchers and hedge fund managers in integrating LLMs into real-world trading pipelines and in stress-testing their robustness under realistic market frictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a hedge-fund-oriented review that collects LLM uses in stock forecasting and stresses real-world pitfalls, but it adds no new experiments or methods.

read the letter

The main point is that this review gathers existing LLM applications for stock price work and organizes them around what hedge funds actually run into. It walks through sentiment extraction from news and social feeds, analysis of reports and earnings calls, turning price data into tokens or symbols, and multi-agent trading setups. The stronger sections call out pitfalls that papers often mention but rarely test hard: sentiment models are fragile, dataset and horizon choices matter a lot, common metrics miss economic reality, data leakage is easy to miss, illiquidity premia distort results, and stock prices have tight predictability limits anyway. Framing the whole thing as deployment guidance rather than pure academic survey is the clearest addition here. It keeps the tone practical and avoids promising that LLMs will beat markets. The soft spots are straightforward for a review. No new data, no fresh derivations, and no described search protocol or inclusion rules, so the selection of papers could be narrower than it looks. The pitfalls are listed clearly but without many concrete counter-examples pulled from the cited studies, which leaves some claims at a general level. Since the work does not claim to be an exhaustive meta-analysis, this is not a deal-breaker, but it does mean readers still need to check the originals for balance. This is aimed at quant researchers and fund managers who are weighing whether to build or test LLM pipelines and want a short map plus warnings. It will not shift theory or settle open questions on predictability. A serious editor should send it for peer review, mainly to check citation fairness and ask for a few more specific examples of the pitfalls in action.

Referee Report

1 major / 3 minor

Summary. The manuscript is a narrative literature review synthesizing applications of large language models (LLMs) for stock price forecasting. It covers sentiment extraction from news and social media, analysis of financial reports and earnings-call transcripts, tokenization/symbolization of price time series, and multi-agent trading systems. Particular emphasis is placed on practical pitfalls relevant to hedge-fund use, including fragility of sentiment signals, dataset/horizon choices, evaluation metrics, data leakage, illiquidity effects, and fundamental limits to predictability. The work is positioned as guidance for both researchers and practitioners integrating LLMs into trading pipelines while accounting for market frictions.

Significance. If the synthesis accurately reflects the cited literature and the enumerated pitfalls are illustrated with concrete examples, the review could usefully bridge academic LLM-finance work and real-world hedge-fund constraints. The hedge-fund framing and focus on often-overlooked robustness issues add practical value beyond purely technical surveys. However, the absence of a documented search protocol limits the ability to judge completeness or balance, which tempers the strength of its guidance claim.

major comments (1)

[Introduction] Introduction (or equivalent opening section): The review states that it 'synthesizes recent applications' and pays 'particular attention' to listed pitfalls, yet provides no search protocol, database(s), date range, or inclusion/exclusion criteria. This is load-bearing for the central claim because the representativeness of the selected papers and the assertion that the pitfalls are 'often understated' cannot be evaluated without such information.

minor comments (3)

[Abstract] Abstract and § on applications: The description of 'tokenizing or symbolizing stock price series' would benefit from one or two concrete examples drawn from the cited works to make the technique accessible to readers outside the immediate subfield.
[Pitfalls discussion] Pitfalls section: While the pitfalls are enumerated, adding a short table that maps each pitfall to one or two representative papers (with brief illustration) would improve clarity and allow readers to trace the claims directly to the literature.
[References] References: Verify that all cited works are correctly formatted and that preprints are explicitly labeled as such; this is especially important in a fast-moving area where version control matters for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and recommendation of minor revision. The single major comment identifies a valid opportunity to improve transparency in our narrative review, and we will revise the manuscript accordingly.

read point-by-point responses

Referee: [Introduction] Introduction (or equivalent opening section): The review states that it 'synthesizes recent applications' and pays 'particular attention' to listed pitfalls, yet provides no search protocol, database(s), date range, or inclusion/exclusion criteria. This is load-bearing for the central claim because the representativeness of the selected papers and the assertion that the pitfalls are 'often understated' cannot be evaluated without such information.

Authors: We agree that documenting the literature selection process would strengthen the review's credibility and allow readers to better assess its scope and balance. Although the manuscript is explicitly positioned as a narrative synthesis (not a systematic review requiring a full PRISMA protocol), the absence of any description of how papers were chosen does limit evaluation of representativeness. In the revised version, we will add a concise subsection titled 'Scope and Literature Selection' immediately following the opening paragraph of the Introduction. This subsection will specify: (i) primary sources consulted (arXiv, SSRN, Google Scholar, and selected finance journals such as the Journal of Financial Economics and Quantitative Finance); (ii) approximate temporal focus (primarily 2020–early 2024 to capture post-Transformer LLM developments); (iii) core search terms (combinations of 'large language model', 'LLM', 'stock price forecasting', 'sentiment analysis', 'earnings call transcripts', and 'multi-agent trading'); and (iv) inclusion criteria (empirical or practical demonstrations of LLM use in forecasting or trading, with emphasis on works addressing real-world frictions). We will also note that selection prioritized papers offering concrete examples of pitfalls over purely methodological contributions. This addition will provide the requested context without converting the paper into a systematic review or altering its hedge-fund-oriented narrative tone. revision: yes

Circularity Check

0 steps flagged

No significant circularity; literature synthesis with no internal derivations

full rationale

This manuscript is explicitly a review paper that synthesizes applications and pitfalls from the existing literature on LLMs for stock price forecasting. It contains no original equations, derivations, predictions, fitted parameters, or self-referential claims that reduce to the paper's own inputs by construction. All substantive assertions rest on external citations rather than internal re-derivation or ansatz smuggling. Per the hard rules, a self-contained synthesis against external benchmarks receives score 0 with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a review, the paper does not introduce new free parameters, axioms, or invented entities; its claims rest on the representativeness of the surveyed literature and the validity of the practical pitfalls identified therein.

pith-pipeline@v0.9.0 · 5432 in / 1122 out tokens · 45158 ms · 2026-05-10T17:01:11.824820+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 25 canonical work pages

[1]

M. L. de Prado,Advances in Financial Machine Learning. John Wiley & Sons, 2018

2018
[2]

Enhancing literature review with LLM and NLP methods: Algorithmic trading case,

S. Łaniewski and R. ’Slepaczuk, “Enhancing literature review with LLM and NLP methods: Algorithmic trading case,”arXiv preprint arXiv:2411.05013, 2024

work page arXiv 2024
[3]

arXiv preprint arXiv:2406.11903 , year=

Y . Nie, Y . Kong, X. Dong,et al., “A survey of large language models for financial applications: Progress, prospects and challenges,”arXiv preprintarXiv:2406.11903, 2024

work page arXiv 2024
[4]

Large language models in finance: A survey,

Y . Li, S. Wang, H. Ding, and H. Chen, “Large language models in finance: A survey,” inProceedings of the Fourth ACM International Conference on AI in Finance, pp. 374–382, 2023

2023
[5]

Large language models for financial and investment management: Applications and benchmarks,

Y . Kong, Y . Nie, X. Dong,et al., “Large language models for financial and investment management: Applications and benchmarks,”Journal of Portfolio Management, vol. 51, no. 2, 2024

2024
[6]

Large language models in equity markets: Applications, techniques, and insights,

A. Jadhav and V . Mirza, “Large language models in equity markets: Applications, techniques, and insights,”Frontiers in Artificial Intelli- gence, vol. 8, Art. 1608365, 2025. doi: 10.3389/frai.2025.1608365

work page doi:10.3389/frai.2025.1608365 2025
[7]

Critical review of text mining and sentiment analysis for stock market prediction,

Z. Jankov’a, “Critical review of text mining and sentiment analysis for stock market prediction,”Journal of Business Economics and Management, vol. 24, no. 1, pp. 177–198, 2023

2023
[8]

Financial sentiment analysis: Techniques and applications,

K. Du, F. Xing, R. Mao, and E. Cambria, “Financial sentiment analysis: Techniques and applications,”ACM Computing Surveys, vol. 56, no. 9, pp. 1–42, 2024

2024
[9]

GPT-InvestAR: Enhancing stock investment strategies through annual report analysis with large language models,

U. Gupta, “GPT-InvestAR: Enhancing stock investment strategies through annual report analysis with large language models,”arXiv preprintarXiv:2309.03079, 2023

work page arXiv 2023
[10]

StockGPT: A GenAI Model for Stock Prediction and Trading,

D. Mai, “StockGPT: A GenAI Model for Stock Prediction and Trading,”The Journal of Financial Data Science, early access, Sep. 17, 2025. doi: 10.3905/jfds.2025.1.202

work page doi:10.3905/jfds.2025.1.202 2025
[11]

Ploutos: Towards explainable stock movement prediction with financial large language model,

H. Tong, J. Li, N. Wu,et al., “Ploutos: Towards explainable stock movement prediction with financial large language model,” inCom- panion Proceedings of the ACM on Web Conference 2025, pp. 490– 499, 2025

2025
[12]

Enhancing automated trading with sentiment analysis: Leveraging large language models for stock market predictions,

M. T. Siddique, S. S. Jamee, A. Sajal,et al., “Enhancing automated trading with sentiment analysis: Leveraging large language models for stock market predictions,”The American Journal of Engineering and Technology, vol. 7, no. 3, pp. 185–195, 2025

2025
[13]

arXiv preprint arXiv:2304.05351 , year=

Q. Xie, W. Han, Y . Lai,et al., “The Wall Street Neophyte: A zero- shot analysis of ChatGPT over multimodal stock movement prediction challenges,”arXiv preprintarXiv:2304.05351, 2023

work page arXiv 2023
[14]

Lopez-Lira and Y

A. Lopez-Lira and Y . Tang, “Can ChatGPT forecast stock price movements? Return predictability and large language models,”arXiv preprintarXiv:2304.07619, 2023

work page arXiv 2023
[15]

arXiv preprint arXiv:2306.11025 , year =

X. Yu, Z. Chen, Y . Ling,et al., “Temporal data meets LLM—Explainable financial time series forecasting,”arXiv preprint arXiv:2306.11025, 2023

work page arXiv 2023
[16]

Using financial news sentiment for stock price direction prediction,

B. Fazlija and P. Harder, “Using financial news sentiment for stock price direction prediction,”Mathematics, vol. 10, no. 13, pp. 2156, 2022

2022
[17]

Linking microblogging sentiments to stock price movement: An application of GPT-4,

R. Steinert and S. Altmann, “Linking microblogging sentiments to stock price movement: An application of GPT-4,”arXiv preprint arXiv:2308.16771, 2023

work page arXiv 2023
[18]

Transforming sen- timent analysis in the financial domain with ChatGPT,

G. Fatouros, J. Soldatos, K. Kouroumali,et al., “Transforming sen- timent analysis in the financial domain with ChatGPT,”Machine Learning with Applications, vol. 14, pp. 100508, 2023

2023
[19]

Risklabs: Predicting financial risk using large language model based on multi-sources data,

Y . Cao, Z. Chen, Q. Pei,et al., “Risklabs: Predicting financial risk using large language model based on multi-sources data,”Technical Report, 2024

2024
[20]

Finbert: Financial sentiment analysis with pre-trained language models

D. Araci, “FinBERT: Financial sentiment analysis with pre-trained language models,”Technical Report, arXiv:1908.10063, 2019

work page arXiv 1908
[21]

LLM-Driven Knowledge Enhancement for Securities Index Prediction,

Z. Di, J. Chen, Y . Yang,et al., “LLM-Driven Knowledge Enhancement for Securities Index Prediction,” inProceedings of LKM@IJCAI, 2024

2024
[22]

Temporal convolutional networks and BERT-based multi-label emotion analysis for financial forecasting,

C. M. Liapis and S. Kotsiantis, “Temporal convolutional networks and BERT-based multi-label emotion analysis for financial forecasting,” Information, vol. 14, no. 11, pp. 596, 2023, MDPI

2023
[23]

FinSentLLM: Multi-LLM and struc- tured semantic signals for enhanced financial sentiment forecasting,

Z. Zhang, R. Fu, Y . He,et al., “FinSentLLM: Multi-LLM and struc- tured semantic signals for enhanced financial sentiment forecasting,” arXiv preprintarXiv:2509.12638, 2025

work page arXiv 2025
[24]

LLM-driven feature extraction for stock market prediction: A case study of Tehran Stock Exchange,

S. H. Saffarian and S. Haratizadeh, “LLM-driven feature extraction for stock market prediction: A case study of Tehran Stock Exchange,” in Proceedings of the 2024 15th International Conference on Information and Knowledge Technology (IKT), pp. 59–65, 2024

2024
[25]

LLM-guided semantic feature selection for interpretable financial market forecasting in low-resource financial markets,

O. Mutian, J. J. Thomas, Y . Tianzhou, and U. Fiore, “LLM-guided semantic feature selection for interpretable financial market forecasting in low-resource financial markets,”Franklin Open, p. 100359, 2025

2025
[26]

ECC Analyzer: Extracting trading signal from earnings conference calls using large language model for stock volatility prediction,

Y . Cao, Z. Chen, Q. Pei,et al., “ECC Analyzer: Extracting trading signal from earnings conference calls using large language model for stock volatility prediction,” inProceedings of the 5th ACM Interna- tional Conference on AI in Finance, pp. 257–265, 2024

2024
[27]

Predicting stock price trends using language models to extract the sentiment from analyst reports: Evidence from IBEX 35-listed companies,

A. Moreno and J. Ordieres-Mer’e, “Predicting stock price trends using language models to extract the sentiment from analyst reports: Evidence from IBEX 35-listed companies,”Economics Letters, p. 112404, 2025

2025
[28]

Combining financial data and news articles for stock price movement prediction using large language models,

A. Elahi and F. Taghvaei, “Combining financial data and news articles for stock price movement prediction using large language models,” in Proceedings of the 2024 IEEE International Conference on Big Data (BigData), pp. 4875–4883, 2024

2024
[29]

Comparative analysis of LLM-based market prediction and human expertise with sentiment analysis and machine learning integration,

M. Abdelsamie and H. Wang, “Comparative analysis of LLM-based market prediction and human expertise with sentiment analysis and machine learning integration,” inProceedings of the 2024 7th In- ternational Conference on Data Science and Information Technology (DSIT), pp. 1–6, 2024

2024
[30]

Cross-sector market regime forecasting with LLM-augmented news analysis,

T. Mudarisov, R. V . State, Z. Kraussl,et al., “Cross-sector market regime forecasting with LLM-augmented news analysis,” inProceed- ings of the 5th ACM International Conference on AI in Finance, pp. 461–468, 2024

2024
[31]

When a Crystal Ball Isn’t Enough to Make You Rich,

V . Haghani and J. White, “When a Crystal Ball Isn’t Enough to Make You Rich,”Available at SSRN, 2024

2024
[32]

Would a Time Machine Make You a Great Investor?,

S. Jakab, “Would a Time Machine Make You a Great Investor?,” The Wall Street Journal, Oct. 14, 2024. [Online]. Available: https://www.wsj.com/finance/investing/would-a-time-machine-make- you-a-great-investor-7a4b39b8

2024
[33]

CBITS: Crypto BERT incorporated trading system,

G. Kim, M. Kim, B. Kim, and H. Lim, “CBITS: Crypto BERT incorporated trading system,”IEEE Access, vol. 11, pp. 6912–6921, 2023

2023
[34]

Enhancing few-shot stock trend prediction with large language models,

Y . Deng, X. He, J. Hu, and S.-M. Yiu, “Enhancing few-shot stock trend prediction with large language models,”arXiv preprint arXiv:2407.09003, 2024

work page arXiv 2024
[35]

What do LLMs know about financial markets? A case study on Reddit market sentiment analysis,

X. Deng, V . Bashlovkina, F. Han,et al., “What do LLMs know about financial markets? A case study on Reddit market sentiment analysis,” inCompanion Proceedings of the ACM Web Conference 2023, pp. 107–110, 2023

2023
[36]

Forecasting cryptocurrency returns from sentiment signals: An analysis of BERT classifiers and weak supervi- sion,

D. Ider and S. Lessmann, “Forecasting cryptocurrency returns from sentiment signals: An analysis of BERT classifiers and weak supervi- sion,”arXiv preprintarXiv:2204.05781, 2022

work page arXiv 2022
[37]

Deep learning and NLP in cryptocurrency forecasting: Integrating financial, blockchain, and social media data,

V . Gurgul, S. Lessmann, and W. K. H”ardle, “Deep learning and NLP in cryptocurrency forecasting: Integrating financial, blockchain, and social media data,”International Journal of Forecasting, Elsevier, 2025

2025
[38]

Good debt or bad debt: Detecting semantic orientations in economic texts,

P. Malo, A. Sinha, P. Korhonen,et al., “Good debt or bad debt: Detecting semantic orientations in economic texts,”Journal of the Association for Information Science and Technology, vol. 65, no. 4, pp. 782–796, 2014

2014
[39]

Large Language Models Are Zero-Shot Time Series Forecasters,

N. Gruver, M. Finzi, S. Qiu, and A. G. Wilson, “Large Language Models Are Zero-Shot Time Series Forecasters,” inAdvances in Neural Information Processing Systems, vol. 36. Red Hook, NY: Curran Associates, 2023, pp. 19622–19635

2023
[40]

Chatgpt informed graph neural network for stock movement prediction.arXiv preprint arXiv:2306.03763, 2023

Z. Chen, L. N. Zheng, C. Lu,et al., “ChatGPT-informed graph neural network for stock movement prediction,”arXiv preprint arXiv:2306.03763, 2023

work page arXiv 2023
[41]

Time-llm: Time series forecasting by reprogramming large language models

M. Jin, S. Wang, L. Ma,et al., “Time-LLM: Time series fore- casting by reprogramming large language models,”arXiv preprint arXiv:2310.01728, 2023

work page arXiv 2023
[42]

Buy Tesla, sell Ford: Assessing implicit stock market preference in pre-trained language models,

C. Chuang and Y . Yang, “Buy Tesla, sell Ford: Assessing implicit stock market preference in pre-trained language models,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 100–105, 2022

2022
[43]

Large language models and sentiment analysis in financial markets: A review, datasets and case study,

C. Liu, A. Arulappan, R. Naha,et al., “Large language models and sentiment analysis in financial markets: A review, datasets and case study,”IEEE Access, 2024

2024
[44]

Experiencing SAX: A novel symbolic representation of time series,

J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: A novel symbolic representation of time series,”Data Mining and Knowledge Discovery, vol. 15, no. 2, pp. 107–144, 2007

2007
[45]

Advancing innovation in financial stability: A comprehensive review of AI agent frameworks, challenges and applications,

S. Joshi, “Advancing innovation in financial stability: A comprehensive review of AI agent frameworks, challenges and applications,”World Journal of Advanced Engineering Technology and Sciences, vol. 14, no. 2, pp. 117–126, 2025

2025
[46]

FinArena: A human-agent collabora- tion framework for financial market analysis and forecasting,

C. Xu, Z. Liu, and Z. Li, “FinArena: A human-agent collabora- tion framework for financial market analysis and forecasting,”arXiv preprintarXiv:2503.02692, 2025

work page arXiv 2025
[47]

Enhancing investment analysis: Optimizing AI-agent collaboration in financial research,

X. Han, N. Wang, S. Che,et al., “Enhancing investment analysis: Optimizing AI-agent collaboration in financial research,” inProceed- ings of the 5th ACM International Conference on AI in Finance, pp. 538–546, 2024

2024
[48]

AlphaAgents: Large language model based multi-agents for equity portfolio constructions,

T. Zhao, J. Lyu, S. Jones,et al., “AlphaAgents: Large language model based multi-agents for equity portfolio constructions,”arXiv preprint arXiv:2508.11152, 2025

work page arXiv 2025
[49]

Designing heterogeneous LLM agents for financial sentiment analysis,

F. Xing, “Designing heterogeneous LLM agents for financial sentiment analysis,”ACM Transactions on Management Information Systems, vol. 16, no. 1, pp. 1–24, 2025, ACM, New York, NY

2025
[50]

Integrating traditional technical analysis with AI: A multi-agent LLM-based approach to stock market forecasting,

M. Wawer and J. A. Chudziak, “Integrating traditional technical analysis with AI: A multi-agent LLM-based approach to stock market forecasting,”arXiv preprintarXiv:2506.16813, 2025

work page arXiv 2025
[51]

Retrieval-augmented generation for knowledge-intensive NLP tasks,

P. Lewis, E. Perez, A. Piktus,et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,”Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

2020
[52]

ReAct: Synergizing reasoning and act- ing in language models,

S. Yao, J. Zhao, D. Yu,et al., “ReAct: Synergizing reasoning and act- ing in language models,” inProceedings of the Eleventh International Conference on Learning Representations, 2022

2022
[53]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans,et al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022

2022
[54]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao,et al., “Tree of thoughts: Deliberate problem solving with large language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 11809–11822, 2023

2023
[55]

Reflexion: Language agents with verbal reinforcement learning,

N. Shinn, F. Cassano, A. Gopinath,et al., “Reflexion: Language agents with verbal reinforcement learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 8634–8652, 2023

2023
[56]

Agent ai with langgraph: A modular framework for enhancing machine translation using large language models.arXiv preprint arXiv:2412.03801, 2024

J. Wang and Z. Duan, “Agent AI with LangGraph: A modular framework for enhancing machine translation using large language models,”arXiv preprintarXiv:2412.03801, 2024

work page arXiv 2024
[57]

AutoGen: Enabling next-gen LLM applications via multi-agent conversations,

Q. Wu, G. Bansal, J. Zhang,et al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversations,” inProceedings of the First Conference on Language Modeling, 2024

2024
[58]

arXiv preprint arXiv:2411.18241 , year =

Z. Duan and J. Wang, “Exploration of LLM multi-agent applica- tion implementation based on LangGraph+ CrewAI,”arXiv preprint arXiv:2411.18241, 2024

work page arXiv 2024
[59]

Trading-R1: Financial trading with LLM reasoning via reinforcement learning,

Y . Xiao, E. Sun, T. Chen,et al., “Trading-R1: Financial trading with LLM reasoning via reinforcement learning,”arXiv preprint arXiv:2509.11420, 2025

work page arXiv 2025
[60]

A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,

W. Zhang, L. Zhao, H. Xia,et al., “A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4314–4325, 2024

2024
[61]

A., Tihanyi, N., and Debbah, M

M. A. Ferrag, N. Tihanyi, and M. Debbah, “From LLM reasoning to autonomous AI agents: A comprehensive review,”arXiv preprint arXiv:2504.19678, 2025

work page arXiv 2025
[62]

From news to forecast: Integrating event analysis in LLM-based time series forecasting with reflection,

X. Wang, M. Feng, J. Qiu,et al., “From news to forecast: Integrating event analysis in LLM-based time series forecasting with reflection,” Advances in Neural Information Processing Systems, vol. 37, pp. 58118–58153, 2024

2024
[63]

B. G. Malkiel,A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing. New York: W. W. Norton & Com- pany, 2019

2019
[64]

A non-random walk down Wall Street,

A. W. Lo and A. C. MacKinlay, “A non-random walk down Wall Street,” inA Non-Random Walk Down Wall Street. Princeton, NJ: Princeton University Press, 2011

2011
[65]

Accurate stock movement prediction with self-supervised learning from sparse noisy tweets,

Y . Soun, J. Yoo, M. Cho,et al., “Accurate stock movement prediction with self-supervised learning from sparse noisy tweets,” inProceed- ings of the 2022 IEEE International Conference on Big Data (Big Data), pp. 1691–1700, 2022

2022
[66]

Stock movement prediction from tweets and historical prices,

Y . Xu and S. B. Cohen, “Stock movement prediction from tweets and historical prices,” inProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1970–1979, 2018

1970
[67]

Hybrid deep sequential mod- eling for social text-driven stock prediction,

H. Wu, W. Zhang, W. Shen,et al., “Hybrid deep sequential mod- eling for social text-driven stock prediction,” inProceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1627–1630, 2018

2018
[68]

J. J. Murphy,Technical Analysis of the Financial Markets: A Com- prehensive Guide to Trading Methods and Applications. New York: Penguin, 1999

1999
[69]

Is all the information in the price? LLM embeddings versus the EMH in stock clustering,

B. Wang, G. Johnson, M. Hybinette,et al., “Is all the information in the price? LLM embeddings versus the EMH in stock clustering,” arXiv preprintarXiv:2509.01590, 2025

work page arXiv 2025
[70]

An evaluation of reasoning capabilities of large language models in financial sentiment analysis,

K. Du, F. Xing, R. Mao, and E. Cambria, “An evaluation of reasoning capabilities of large language models in financial sentiment analysis,” in2024 IEEE Conference on Artificial Intelligence (CAI), pp. 189– 194, 2024

2024
[71]

Tradingagents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138, 2024

Y . Xiao, E. Sun, D. Luo,et al., “TradingAgents: Multi-agents LLM financial trading framework,”arXiv preprintarXiv:2412.20138, 2024

work page arXiv 2024
[72]

What does ChatGPT make of historical stock returns? Extrapolation and miscalibration in LLM stock return forecasts,

S. Chen, T. C. Green, H. Gulen,et al., “What does ChatGPT make of historical stock returns? Extrapolation and miscalibration in LLM stock return forecasts,”arXiv preprintarXiv:2409.11540, 2024

work page arXiv 2024
[73]

Fallibility, reflexivity, and the human uncertainty principle,

G. Soros, “Fallibility, reflexivity, and the human uncertainty principle,” Journal of Economic Methodology, vol. 20, no. 4, pp. 309–329, 2013

2013