pith. machine review for the scientific record. sign in

arxiv: 2605.05211 · v1 · submitted 2026-04-10 · 💱 q-fin.PR · cs.AI· cs.LG· q-fin.ST

Recognition: unknown

A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective

Olivia Zhang, Zhilin Zhang

Pith reviewed 2026-05-10 17:01 UTC · model grok-4.3

classification 💱 q-fin.PR cs.AIcs.LGq-fin.ST
keywords large language modelsstock price forecastinghedge fundssentiment analysisfinancial text analysismulti-agent systemsdata leakagemarket predictability
0
0 comments X

The pith

Large language models show promise for stock price forecasting through text analysis and sequence modeling but require careful handling of practical pitfalls for hedge fund deployment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review gathers recent work on applying large language models to predict stock prices, covering sentiment extraction from news and social media, analysis of reports and earnings calls, tokenization of price series, and multi-agent trading setups. It draws special attention to practical issues that studies often downplay, including unstable sentiment signals, flawed choices of datasets and time horizons, weak evaluation metrics, risks of data leakage, effects from illiquid stocks, and fundamental limits on price predictability. Written from a hedge fund viewpoint, the synthesis aims to help researchers and managers build and test LLM systems that hold up under real trading conditions and market frictions.

Core claim

The review synthesizes applications of large language models in stock price forecasting, including sentiment extraction from financial news and social media, analysis of financial reports and earnings-call transcripts, tokenizing or symbolizing stock price series, and constructing multi-agent trading systems, while highlighting practical pitfalls such as fragility in sentiment analysis, dataset and horizon design, performance evaluation metrics, data leakage, illiquidity premia, and limits of stock price predictability.

What carries the argument

A hedge-fund-oriented synthesis that pairs descriptions of LLM applications in finance with an explicit list of deployment pitfalls to stress-test robustness under market conditions.

If this is right

  • Academic studies of LLMs in finance should incorporate explicit checks for data leakage and realistic out-of-sample horizons.
  • Hedge funds testing LLM systems must adjust performance metrics for illiquidity premia and market impact.
  • Multi-agent LLM trading setups require validation against documented limits on price predictability.
  • Sentiment-based signals from LLMs need additional robustness testing before live use.
  • Evaluation of LLM forecasts should prioritize metrics that reflect real trading frictions over standard accuracy scores.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the pitfalls prove as load-bearing as described, many existing LLM finance papers may overstate practical performance.
  • The review's structure could be adapted to examine LLM applications in adjacent areas such as options pricing or risk management.
  • Practitioners might begin with simpler statistical models and add LLMs only after demonstrating clear gains that survive the identified hurdles.
  • Over-reliance on LLM outputs without addressing leakage and predictability limits could increase model risk in portfolio construction.

Load-bearing premise

The review assumes the cited studies are representative of current LLM use in stock forecasting and that the listed pitfalls are the main obstacles for hedge fund success.

What would settle it

A controlled live-trading experiment or independent replication that applies LLM methods while avoiding the listed pitfalls and still produces consistent positive returns net of costs would undermine the claim that these issues are central barriers.

read the original abstract

Large language models (LLMs) are increasingly deployed in quantitative finance for stock price forecasting. This review synthesizes recent applications of LLMs in this domain, including extracting sentiment from financial news and social media, analyzing financial reports and earnings-call transcripts, tokenizing or symbolizing stock price series, and constructing multi-agent trading systems. Particular attention is paid to practical pitfalls that are often understated in the literature, such as fragility in sentiment analysis, dataset and horizon design, performance evaluation metrics, data leakage, illiquidity premia, and limits of stock price predictability. Organized from a hedge-fund perspective, the review is intended to guide both academic researchers and hedge fund managers in integrating LLMs into real-world trading pipelines and in stress-testing their robustness under realistic market frictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript is a narrative literature review synthesizing applications of large language models (LLMs) for stock price forecasting. It covers sentiment extraction from news and social media, analysis of financial reports and earnings-call transcripts, tokenization/symbolization of price time series, and multi-agent trading systems. Particular emphasis is placed on practical pitfalls relevant to hedge-fund use, including fragility of sentiment signals, dataset/horizon choices, evaluation metrics, data leakage, illiquidity effects, and fundamental limits to predictability. The work is positioned as guidance for both researchers and practitioners integrating LLMs into trading pipelines while accounting for market frictions.

Significance. If the synthesis accurately reflects the cited literature and the enumerated pitfalls are illustrated with concrete examples, the review could usefully bridge academic LLM-finance work and real-world hedge-fund constraints. The hedge-fund framing and focus on often-overlooked robustness issues add practical value beyond purely technical surveys. However, the absence of a documented search protocol limits the ability to judge completeness or balance, which tempers the strength of its guidance claim.

major comments (1)
  1. [Introduction] Introduction (or equivalent opening section): The review states that it 'synthesizes recent applications' and pays 'particular attention' to listed pitfalls, yet provides no search protocol, database(s), date range, or inclusion/exclusion criteria. This is load-bearing for the central claim because the representativeness of the selected papers and the assertion that the pitfalls are 'often understated' cannot be evaluated without such information.
minor comments (3)
  1. [Abstract] Abstract and § on applications: The description of 'tokenizing or symbolizing stock price series' would benefit from one or two concrete examples drawn from the cited works to make the technique accessible to readers outside the immediate subfield.
  2. [Pitfalls discussion] Pitfalls section: While the pitfalls are enumerated, adding a short table that maps each pitfall to one or two representative papers (with brief illustration) would improve clarity and allow readers to trace the claims directly to the literature.
  3. [References] References: Verify that all cited works are correctly formatted and that preprints are explicitly labeled as such; this is especially important in a fast-moving area where version control matters for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and recommendation of minor revision. The single major comment identifies a valid opportunity to improve transparency in our narrative review, and we will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Introduction] Introduction (or equivalent opening section): The review states that it 'synthesizes recent applications' and pays 'particular attention' to listed pitfalls, yet provides no search protocol, database(s), date range, or inclusion/exclusion criteria. This is load-bearing for the central claim because the representativeness of the selected papers and the assertion that the pitfalls are 'often understated' cannot be evaluated without such information.

    Authors: We agree that documenting the literature selection process would strengthen the review's credibility and allow readers to better assess its scope and balance. Although the manuscript is explicitly positioned as a narrative synthesis (not a systematic review requiring a full PRISMA protocol), the absence of any description of how papers were chosen does limit evaluation of representativeness. In the revised version, we will add a concise subsection titled 'Scope and Literature Selection' immediately following the opening paragraph of the Introduction. This subsection will specify: (i) primary sources consulted (arXiv, SSRN, Google Scholar, and selected finance journals such as the Journal of Financial Economics and Quantitative Finance); (ii) approximate temporal focus (primarily 2020–early 2024 to capture post-Transformer LLM developments); (iii) core search terms (combinations of 'large language model', 'LLM', 'stock price forecasting', 'sentiment analysis', 'earnings call transcripts', and 'multi-agent trading'); and (iv) inclusion criteria (empirical or practical demonstrations of LLM use in forecasting or trading, with emphasis on works addressing real-world frictions). We will also note that selection prioritized papers offering concrete examples of pitfalls over purely methodological contributions. This addition will provide the requested context without converting the paper into a systematic review or altering its hedge-fund-oriented narrative tone. revision: yes

Circularity Check

0 steps flagged

No significant circularity; literature synthesis with no internal derivations

full rationale

This manuscript is explicitly a review paper that synthesizes applications and pitfalls from the existing literature on LLMs for stock price forecasting. It contains no original equations, derivations, predictions, fitted parameters, or self-referential claims that reduce to the paper's own inputs by construction. All substantive assertions rest on external citations rather than internal re-derivation or ansatz smuggling. Per the hard rules, a self-contained synthesis against external benchmarks receives score 0 with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a review, the paper does not introduce new free parameters, axioms, or invented entities; its claims rest on the representativeness of the surveyed literature and the validity of the practical pitfalls identified therein.

pith-pipeline@v0.9.0 · 5432 in / 1122 out tokens · 45158 ms · 2026-05-10T17:01:11.824820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 25 canonical work pages

  1. [1]

    M. L. de Prado,Advances in Financial Machine Learning. John Wiley & Sons, 2018

  2. [2]

    Enhancing literature review with LLM and NLP methods: Algorithmic trading case,

    S. Łaniewski and R. ’Slepaczuk, “Enhancing literature review with LLM and NLP methods: Algorithmic trading case,”arXiv preprint arXiv:2411.05013, 2024

  3. [3]

    arXiv preprint arXiv:2406.11903 , year=

    Y . Nie, Y . Kong, X. Dong,et al., “A survey of large language models for financial applications: Progress, prospects and challenges,”arXiv preprintarXiv:2406.11903, 2024

  4. [4]

    Large language models in finance: A survey,

    Y . Li, S. Wang, H. Ding, and H. Chen, “Large language models in finance: A survey,” inProceedings of the Fourth ACM International Conference on AI in Finance, pp. 374–382, 2023

  5. [5]

    Large language models for financial and investment management: Applications and benchmarks,

    Y . Kong, Y . Nie, X. Dong,et al., “Large language models for financial and investment management: Applications and benchmarks,”Journal of Portfolio Management, vol. 51, no. 2, 2024

  6. [6]

    Large language models in equity markets: Applications, techniques, and insights,

    A. Jadhav and V . Mirza, “Large language models in equity markets: Applications, techniques, and insights,”Frontiers in Artificial Intelli- gence, vol. 8, Art. 1608365, 2025. doi: 10.3389/frai.2025.1608365

  7. [7]

    Critical review of text mining and sentiment analysis for stock market prediction,

    Z. Jankov’a, “Critical review of text mining and sentiment analysis for stock market prediction,”Journal of Business Economics and Management, vol. 24, no. 1, pp. 177–198, 2023

  8. [8]

    Financial sentiment analysis: Techniques and applications,

    K. Du, F. Xing, R. Mao, and E. Cambria, “Financial sentiment analysis: Techniques and applications,”ACM Computing Surveys, vol. 56, no. 9, pp. 1–42, 2024

  9. [9]

    GPT-InvestAR: Enhancing stock investment strategies through annual report analysis with large language models,

    U. Gupta, “GPT-InvestAR: Enhancing stock investment strategies through annual report analysis with large language models,”arXiv preprintarXiv:2309.03079, 2023

  10. [10]

    StockGPT: A GenAI Model for Stock Prediction and Trading,

    D. Mai, “StockGPT: A GenAI Model for Stock Prediction and Trading,”The Journal of Financial Data Science, early access, Sep. 17, 2025. doi: 10.3905/jfds.2025.1.202

  11. [11]

    Ploutos: Towards explainable stock movement prediction with financial large language model,

    H. Tong, J. Li, N. Wu,et al., “Ploutos: Towards explainable stock movement prediction with financial large language model,” inCom- panion Proceedings of the ACM on Web Conference 2025, pp. 490– 499, 2025

  12. [12]

    Enhancing automated trading with sentiment analysis: Leveraging large language models for stock market predictions,

    M. T. Siddique, S. S. Jamee, A. Sajal,et al., “Enhancing automated trading with sentiment analysis: Leveraging large language models for stock market predictions,”The American Journal of Engineering and Technology, vol. 7, no. 3, pp. 185–195, 2025

  13. [13]

    arXiv preprint arXiv:2304.05351 , year=

    Q. Xie, W. Han, Y . Lai,et al., “The Wall Street Neophyte: A zero- shot analysis of ChatGPT over multimodal stock movement prediction challenges,”arXiv preprintarXiv:2304.05351, 2023

  14. [14]

    Lopez-Lira and Y

    A. Lopez-Lira and Y . Tang, “Can ChatGPT forecast stock price movements? Return predictability and large language models,”arXiv preprintarXiv:2304.07619, 2023

  15. [15]

    arXiv preprint arXiv:2306.11025 , year =

    X. Yu, Z. Chen, Y . Ling,et al., “Temporal data meets LLM—Explainable financial time series forecasting,”arXiv preprint arXiv:2306.11025, 2023

  16. [16]

    Using financial news sentiment for stock price direction prediction,

    B. Fazlija and P. Harder, “Using financial news sentiment for stock price direction prediction,”Mathematics, vol. 10, no. 13, pp. 2156, 2022

  17. [17]

    Linking microblogging sentiments to stock price movement: An application of GPT-4,

    R. Steinert and S. Altmann, “Linking microblogging sentiments to stock price movement: An application of GPT-4,”arXiv preprint arXiv:2308.16771, 2023

  18. [18]

    Transforming sen- timent analysis in the financial domain with ChatGPT,

    G. Fatouros, J. Soldatos, K. Kouroumali,et al., “Transforming sen- timent analysis in the financial domain with ChatGPT,”Machine Learning with Applications, vol. 14, pp. 100508, 2023

  19. [19]

    Risklabs: Predicting financial risk using large language model based on multi-sources data,

    Y . Cao, Z. Chen, Q. Pei,et al., “Risklabs: Predicting financial risk using large language model based on multi-sources data,”Technical Report, 2024

  20. [20]

    Finbert: Financial sentiment analysis with pre-trained language models

    D. Araci, “FinBERT: Financial sentiment analysis with pre-trained language models,”Technical Report, arXiv:1908.10063, 2019

  21. [21]

    LLM-Driven Knowledge Enhancement for Securities Index Prediction,

    Z. Di, J. Chen, Y . Yang,et al., “LLM-Driven Knowledge Enhancement for Securities Index Prediction,” inProceedings of LKM@IJCAI, 2024

  22. [22]

    Temporal convolutional networks and BERT-based multi-label emotion analysis for financial forecasting,

    C. M. Liapis and S. Kotsiantis, “Temporal convolutional networks and BERT-based multi-label emotion analysis for financial forecasting,” Information, vol. 14, no. 11, pp. 596, 2023, MDPI

  23. [23]

    FinSentLLM: Multi-LLM and struc- tured semantic signals for enhanced financial sentiment forecasting,

    Z. Zhang, R. Fu, Y . He,et al., “FinSentLLM: Multi-LLM and struc- tured semantic signals for enhanced financial sentiment forecasting,” arXiv preprintarXiv:2509.12638, 2025

  24. [24]

    LLM-driven feature extraction for stock market prediction: A case study of Tehran Stock Exchange,

    S. H. Saffarian and S. Haratizadeh, “LLM-driven feature extraction for stock market prediction: A case study of Tehran Stock Exchange,” in Proceedings of the 2024 15th International Conference on Information and Knowledge Technology (IKT), pp. 59–65, 2024

  25. [25]

    LLM-guided semantic feature selection for interpretable financial market forecasting in low-resource financial markets,

    O. Mutian, J. J. Thomas, Y . Tianzhou, and U. Fiore, “LLM-guided semantic feature selection for interpretable financial market forecasting in low-resource financial markets,”Franklin Open, p. 100359, 2025

  26. [26]

    ECC Analyzer: Extracting trading signal from earnings conference calls using large language model for stock volatility prediction,

    Y . Cao, Z. Chen, Q. Pei,et al., “ECC Analyzer: Extracting trading signal from earnings conference calls using large language model for stock volatility prediction,” inProceedings of the 5th ACM Interna- tional Conference on AI in Finance, pp. 257–265, 2024

  27. [27]

    Predicting stock price trends using language models to extract the sentiment from analyst reports: Evidence from IBEX 35-listed companies,

    A. Moreno and J. Ordieres-Mer’e, “Predicting stock price trends using language models to extract the sentiment from analyst reports: Evidence from IBEX 35-listed companies,”Economics Letters, p. 112404, 2025

  28. [28]

    Combining financial data and news articles for stock price movement prediction using large language models,

    A. Elahi and F. Taghvaei, “Combining financial data and news articles for stock price movement prediction using large language models,” in Proceedings of the 2024 IEEE International Conference on Big Data (BigData), pp. 4875–4883, 2024

  29. [29]

    Comparative analysis of LLM-based market prediction and human expertise with sentiment analysis and machine learning integration,

    M. Abdelsamie and H. Wang, “Comparative analysis of LLM-based market prediction and human expertise with sentiment analysis and machine learning integration,” inProceedings of the 2024 7th In- ternational Conference on Data Science and Information Technology (DSIT), pp. 1–6, 2024

  30. [30]

    Cross-sector market regime forecasting with LLM-augmented news analysis,

    T. Mudarisov, R. V . State, Z. Kraussl,et al., “Cross-sector market regime forecasting with LLM-augmented news analysis,” inProceed- ings of the 5th ACM International Conference on AI in Finance, pp. 461–468, 2024

  31. [31]

    When a Crystal Ball Isn’t Enough to Make You Rich,

    V . Haghani and J. White, “When a Crystal Ball Isn’t Enough to Make You Rich,”Available at SSRN, 2024

  32. [32]

    Would a Time Machine Make You a Great Investor?,

    S. Jakab, “Would a Time Machine Make You a Great Investor?,” The Wall Street Journal, Oct. 14, 2024. [Online]. Available: https://www.wsj.com/finance/investing/would-a-time-machine-make- you-a-great-investor-7a4b39b8

  33. [33]

    CBITS: Crypto BERT incorporated trading system,

    G. Kim, M. Kim, B. Kim, and H. Lim, “CBITS: Crypto BERT incorporated trading system,”IEEE Access, vol. 11, pp. 6912–6921, 2023

  34. [34]

    Enhancing few-shot stock trend prediction with large language models,

    Y . Deng, X. He, J. Hu, and S.-M. Yiu, “Enhancing few-shot stock trend prediction with large language models,”arXiv preprint arXiv:2407.09003, 2024

  35. [35]

    What do LLMs know about financial markets? A case study on Reddit market sentiment analysis,

    X. Deng, V . Bashlovkina, F. Han,et al., “What do LLMs know about financial markets? A case study on Reddit market sentiment analysis,” inCompanion Proceedings of the ACM Web Conference 2023, pp. 107–110, 2023

  36. [36]

    Forecasting cryptocurrency returns from sentiment signals: An analysis of BERT classifiers and weak supervi- sion,

    D. Ider and S. Lessmann, “Forecasting cryptocurrency returns from sentiment signals: An analysis of BERT classifiers and weak supervi- sion,”arXiv preprintarXiv:2204.05781, 2022

  37. [37]

    Deep learning and NLP in cryptocurrency forecasting: Integrating financial, blockchain, and social media data,

    V . Gurgul, S. Lessmann, and W. K. H”ardle, “Deep learning and NLP in cryptocurrency forecasting: Integrating financial, blockchain, and social media data,”International Journal of Forecasting, Elsevier, 2025

  38. [38]

    Good debt or bad debt: Detecting semantic orientations in economic texts,

    P. Malo, A. Sinha, P. Korhonen,et al., “Good debt or bad debt: Detecting semantic orientations in economic texts,”Journal of the Association for Information Science and Technology, vol. 65, no. 4, pp. 782–796, 2014

  39. [39]

    Large Language Models Are Zero-Shot Time Series Forecasters,

    N. Gruver, M. Finzi, S. Qiu, and A. G. Wilson, “Large Language Models Are Zero-Shot Time Series Forecasters,” inAdvances in Neural Information Processing Systems, vol. 36. Red Hook, NY: Curran Associates, 2023, pp. 19622–19635

  40. [40]

    Chatgpt informed graph neural network for stock movement prediction.arXiv preprint arXiv:2306.03763, 2023

    Z. Chen, L. N. Zheng, C. Lu,et al., “ChatGPT-informed graph neural network for stock movement prediction,”arXiv preprint arXiv:2306.03763, 2023

  41. [41]

    Time-llm: Time series forecasting by reprogramming large language models

    M. Jin, S. Wang, L. Ma,et al., “Time-LLM: Time series fore- casting by reprogramming large language models,”arXiv preprint arXiv:2310.01728, 2023

  42. [42]

    Buy Tesla, sell Ford: Assessing implicit stock market preference in pre-trained language models,

    C. Chuang and Y . Yang, “Buy Tesla, sell Ford: Assessing implicit stock market preference in pre-trained language models,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 100–105, 2022

  43. [43]

    Large language models and sentiment analysis in financial markets: A review, datasets and case study,

    C. Liu, A. Arulappan, R. Naha,et al., “Large language models and sentiment analysis in financial markets: A review, datasets and case study,”IEEE Access, 2024

  44. [44]

    Experiencing SAX: A novel symbolic representation of time series,

    J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: A novel symbolic representation of time series,”Data Mining and Knowledge Discovery, vol. 15, no. 2, pp. 107–144, 2007

  45. [45]

    Advancing innovation in financial stability: A comprehensive review of AI agent frameworks, challenges and applications,

    S. Joshi, “Advancing innovation in financial stability: A comprehensive review of AI agent frameworks, challenges and applications,”World Journal of Advanced Engineering Technology and Sciences, vol. 14, no. 2, pp. 117–126, 2025

  46. [46]

    FinArena: A human-agent collabora- tion framework for financial market analysis and forecasting,

    C. Xu, Z. Liu, and Z. Li, “FinArena: A human-agent collabora- tion framework for financial market analysis and forecasting,”arXiv preprintarXiv:2503.02692, 2025

  47. [47]

    Enhancing investment analysis: Optimizing AI-agent collaboration in financial research,

    X. Han, N. Wang, S. Che,et al., “Enhancing investment analysis: Optimizing AI-agent collaboration in financial research,” inProceed- ings of the 5th ACM International Conference on AI in Finance, pp. 538–546, 2024

  48. [48]

    AlphaAgents: Large language model based multi-agents for equity portfolio constructions,

    T. Zhao, J. Lyu, S. Jones,et al., “AlphaAgents: Large language model based multi-agents for equity portfolio constructions,”arXiv preprint arXiv:2508.11152, 2025

  49. [49]

    Designing heterogeneous LLM agents for financial sentiment analysis,

    F. Xing, “Designing heterogeneous LLM agents for financial sentiment analysis,”ACM Transactions on Management Information Systems, vol. 16, no. 1, pp. 1–24, 2025, ACM, New York, NY

  50. [50]

    Integrating traditional technical analysis with AI: A multi-agent LLM-based approach to stock market forecasting,

    M. Wawer and J. A. Chudziak, “Integrating traditional technical analysis with AI: A multi-agent LLM-based approach to stock market forecasting,”arXiv preprintarXiv:2506.16813, 2025

  51. [51]

    Retrieval-augmented generation for knowledge-intensive NLP tasks,

    P. Lewis, E. Perez, A. Piktus,et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,”Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

  52. [52]

    ReAct: Synergizing reasoning and act- ing in language models,

    S. Yao, J. Zhao, D. Yu,et al., “ReAct: Synergizing reasoning and act- ing in language models,” inProceedings of the Eleventh International Conference on Learning Representations, 2022

  53. [53]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans,et al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022

  54. [54]

    Tree of thoughts: Deliberate problem solving with large language models,

    S. Yao, D. Yu, J. Zhao,et al., “Tree of thoughts: Deliberate problem solving with large language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 11809–11822, 2023

  55. [55]

    Reflexion: Language agents with verbal reinforcement learning,

    N. Shinn, F. Cassano, A. Gopinath,et al., “Reflexion: Language agents with verbal reinforcement learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 8634–8652, 2023

  56. [56]

    Agent ai with langgraph: A modular framework for enhancing machine translation using large language models.arXiv preprint arXiv:2412.03801, 2024

    J. Wang and Z. Duan, “Agent AI with LangGraph: A modular framework for enhancing machine translation using large language models,”arXiv preprintarXiv:2412.03801, 2024

  57. [57]

    AutoGen: Enabling next-gen LLM applications via multi-agent conversations,

    Q. Wu, G. Bansal, J. Zhang,et al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversations,” inProceedings of the First Conference on Language Modeling, 2024

  58. [58]

    arXiv preprint arXiv:2411.18241 , year =

    Z. Duan and J. Wang, “Exploration of LLM multi-agent applica- tion implementation based on LangGraph+ CrewAI,”arXiv preprint arXiv:2411.18241, 2024

  59. [59]

    Trading-R1: Financial trading with LLM reasoning via reinforcement learning,

    Y . Xiao, E. Sun, T. Chen,et al., “Trading-R1: Financial trading with LLM reasoning via reinforcement learning,”arXiv preprint arXiv:2509.11420, 2025

  60. [60]

    A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,

    W. Zhang, L. Zhao, H. Xia,et al., “A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4314–4325, 2024

  61. [61]

    A., Tihanyi, N., and Debbah, M

    M. A. Ferrag, N. Tihanyi, and M. Debbah, “From LLM reasoning to autonomous AI agents: A comprehensive review,”arXiv preprint arXiv:2504.19678, 2025

  62. [62]

    From news to forecast: Integrating event analysis in LLM-based time series forecasting with reflection,

    X. Wang, M. Feng, J. Qiu,et al., “From news to forecast: Integrating event analysis in LLM-based time series forecasting with reflection,” Advances in Neural Information Processing Systems, vol. 37, pp. 58118–58153, 2024

  63. [63]

    B. G. Malkiel,A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing. New York: W. W. Norton & Com- pany, 2019

  64. [64]

    A non-random walk down Wall Street,

    A. W. Lo and A. C. MacKinlay, “A non-random walk down Wall Street,” inA Non-Random Walk Down Wall Street. Princeton, NJ: Princeton University Press, 2011

  65. [65]

    Accurate stock movement prediction with self-supervised learning from sparse noisy tweets,

    Y . Soun, J. Yoo, M. Cho,et al., “Accurate stock movement prediction with self-supervised learning from sparse noisy tweets,” inProceed- ings of the 2022 IEEE International Conference on Big Data (Big Data), pp. 1691–1700, 2022

  66. [66]

    Stock movement prediction from tweets and historical prices,

    Y . Xu and S. B. Cohen, “Stock movement prediction from tweets and historical prices,” inProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1970–1979, 2018

  67. [67]

    Hybrid deep sequential mod- eling for social text-driven stock prediction,

    H. Wu, W. Zhang, W. Shen,et al., “Hybrid deep sequential mod- eling for social text-driven stock prediction,” inProceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1627–1630, 2018

  68. [68]

    J. J. Murphy,Technical Analysis of the Financial Markets: A Com- prehensive Guide to Trading Methods and Applications. New York: Penguin, 1999

  69. [69]

    Is all the information in the price? LLM embeddings versus the EMH in stock clustering,

    B. Wang, G. Johnson, M. Hybinette,et al., “Is all the information in the price? LLM embeddings versus the EMH in stock clustering,” arXiv preprintarXiv:2509.01590, 2025

  70. [70]

    An evaluation of reasoning capabilities of large language models in financial sentiment analysis,

    K. Du, F. Xing, R. Mao, and E. Cambria, “An evaluation of reasoning capabilities of large language models in financial sentiment analysis,” in2024 IEEE Conference on Artificial Intelligence (CAI), pp. 189– 194, 2024

  71. [71]

    Tradingagents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138, 2024

    Y . Xiao, E. Sun, D. Luo,et al., “TradingAgents: Multi-agents LLM financial trading framework,”arXiv preprintarXiv:2412.20138, 2024

  72. [72]

    What does ChatGPT make of historical stock returns? Extrapolation and miscalibration in LLM stock return forecasts,

    S. Chen, T. C. Green, H. Gulen,et al., “What does ChatGPT make of historical stock returns? Extrapolation and miscalibration in LLM stock return forecasts,”arXiv preprintarXiv:2409.11540, 2024

  73. [73]

    Fallibility, reflexivity, and the human uncertainty principle,

    G. Soros, “Fallibility, reflexivity, and the human uncertainty principle,” Journal of Economic Methodology, vol. 20, no. 4, pp. 309–329, 2013