Recognition: unknown
Cross-Stock Predictability via LLM-Augmented Semantic Networks
Pith reviewed 2026-05-10 01:14 UTC · model grok-4.3
The pith
LLM filtering of edges in 10-K semantic networks raises the Sharpe ratio of cross-stock mean-reversion strategies from 0.742 to 0.820.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that a two-stage process—first forming a candidate graph from 10-K embeddings and then using an LLM to filter edges according to their economic content—produces a network that, when used to weight and combine mean-reversion signals, yields a long-short portfolio with a Sharpe ratio of 0.820 and maximum drawdown of -7.85 percent, compared with 0.742 and -10.47 percent for the unfiltered baseline on S&P 500 stocks between 2011 and 2019.
What carries the argument
Two-stage LLM-augmented semantic network: sparse candidate graph from embedding similarities followed by LLM classification of economic relations, then relation-aware and distance-based weighting of aggregated signals.
If this is right
- LLM edge filtering reduces noise from non-economic links and thereby strengthens the aggregated trading signals.
- Relation-aware weighting of mean-reversion pairs produces more accurate stock-level forecasts than uniform aggregation.
- The refined graph structure can be reused for other text-based predictability tasks beyond mean reversion.
- Performance gains appear in both higher returns per unit of risk and lower tail losses during the sample period.
Where Pith is reading between the lines
- The approach could support periodic network updates as new 10-K filings become available without rebuilding the entire graph.
- Similar LLM filtering might be tested on other disclosure types such as earnings transcripts to check for broader applicability.
- The method offers a way to combine embedding-based scaling with targeted human-like judgment on link validity.
Load-bearing premise
The large language model correctly and consistently identifies which textual similarities reflect real economic connections rather than introducing bias or inconsistency.
What would settle it
A replication in which the LLM-filtered network produces no improvement or a deterioration in Sharpe ratio and drawdown when tested on the same 2011-2019 S&P 500 sample with varied prompts or models.
Figures
read the original abstract
Text-based financial networks are increasingly used to study cross-stock return predictability. A common approach constructs links from similarities in firms' disclosure embeddings, but such networks often contain spurious edges because textual proximity does not necessarily imply economic connection. We propose a two-stage framework that first builds a sparse candidate graph from 10-K embeddings and then uses a large language model to classify and filter candidate edges according to their economic relations. The refined graph is used to aggregate pair-level mean-reversion signals into stock-level trading signals with relation-aware and distance-based weights. In a backtest on S&P 500 constituents from 2011 to 2019, LLM-based edge filtering improves the long-short Sharpe ratio from 0.742 to 0.820 and reduces maximum drawdown from $-$10.47% to $-$7.85%. These results suggest that LLM-based reasoning can improve the economic fidelity of text-derived financial networks and strengthen cross-stock predictability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-stage framework for text-based financial networks: first constructing a sparse candidate graph from similarities in 10-K filing embeddings, then using a large language model to classify and filter candidate edges according to their economic relations. The refined graph aggregates pair-level mean-reversion signals into stock-level trading signals via relation-aware and distance-based weights. Backtests on S&P 500 constituents (2011–2019) report that LLM filtering raises the long-short Sharpe ratio from 0.742 to 0.820 and lowers maximum drawdown from −10.47% to −7.85%.
Significance. If the reported gains can be shown to arise from the LLM’s detection of economically meaningful relations rather than from any reduction in graph density, the work would provide a concrete method for improving the fidelity of embedding-derived networks in quantitative finance. The approach is timely given the rapid adoption of LLMs for semantic reasoning, and the empirical numbers are presented clearly enough to invite replication.
major comments (2)
- [Empirical results] Empirical results section: the headline comparison (Sharpe 0.742 vs. 0.820) pits the LLM-filtered graph only against the unfiltered sparse candidate graph. No ablation is reported that holds edge density fixed while changing the selection rule (e.g., random subsampling or non-LLM heuristics). Without this control, the performance lift cannot be attributed specifically to LLM economic classification rather than to the mechanical effect of pruning noisy pairs.
- [Methodology] Methodology section: the central assumption that the LLM accurately and consistently classifies edges as economically meaningful is not supported by any reported validation (human evaluation of classification accuracy, inter-annotator agreement, or tests for prompt sensitivity and hallucination). This validation is load-bearing for the claim that the refined graph improves economic fidelity.
minor comments (2)
- The abstract and results report concrete performance numbers but supply no information on the statistical significance of the Sharpe improvement or on robustness to alternative backtest windows.
- Details on the exact LLM prompt template, temperature setting, and any post-processing of LLM outputs are absent, making the procedure difficult to reproduce.
Simulated Author's Rebuttal
We thank the referee for the detailed and insightful comments on our manuscript. We agree with the concerns raised regarding the need for additional controls in the empirical analysis and validation of the LLM component. Below we provide point-by-point responses and indicate the revisions we plan to implement.
read point-by-point responses
-
Referee: Empirical results section: the headline comparison (Sharpe 0.742 vs. 0.820) pits the LLM-filtered graph only against the unfiltered sparse candidate graph. No ablation is reported that holds edge density fixed while changing the selection rule (e.g., random subsampling or non-LLM heuristics). Without this control, the performance lift cannot be attributed specifically to LLM economic classification rather than to the mechanical effect of pruning noisy pairs.
Authors: We acknowledge that the current comparison does not isolate the LLM's contribution from the effect of reduced graph density. In the revised version, we will include an ablation study that holds the number of edges constant by randomly subsampling the candidate graph to the same density as the LLM-filtered graph. We will report the Sharpe ratio and maximum drawdown for this random baseline alongside the LLM-filtered results. This will provide evidence on whether the performance gains stem specifically from the economic classification by the LLM. We will also explore a non-LLM heuristic such as keeping the top-k most similar pairs if it fits within the page limits. revision: yes
-
Referee: Methodology section: the central assumption that the LLM accurately and consistently classifies edges as economically meaningful is not supported by any reported validation (human evaluation of classification accuracy, inter-annotator agreement, or tests for prompt sensitivity and hallucination). This validation is load-bearing for the claim that the refined graph improves economic fidelity.
Authors: The referee is correct that we have not provided direct validation of the LLM's edge classifications in the submitted manuscript. To address this, we will add a section reporting a human evaluation study. Specifically, we will sample a subset of candidate edges, have them classified by the LLM, and then have domain experts (e.g., finance researchers) independently label whether the relation is economically meaningful. We will report accuracy, precision, recall, and inter-annotator agreement (Cohen's kappa). Additionally, we will test the sensitivity of results to different prompts and report any variations observed. These additions will substantiate the reliability of the filtering step. revision: yes
Circularity Check
No circularity in empirical backtest framework
full rationale
The paper describes a two-stage empirical procedure: construct a candidate graph from 10-K embeddings, apply LLM classification to filter edges, then aggregate mean-reversion signals into trading signals for a historical backtest on S&P 500 data 2011-2019. No equations, derivations, or first-principles claims are present that could reduce to self-definition, fitted parameters renamed as predictions, or self-citation chains. The reported Sharpe and drawdown figures are direct outputs of applying the procedure to the fixed historical returns; the LLM step operates on textual input independent of the performance metric. This is a standard empirical evaluation with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can reliably identify economically meaningful relations from pairs of firm disclosures without systematic error or prompt dependence.
Forward citations
Cited by 1 Pith paper
-
From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets
Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.
Reference graph
Works this paper leans on
-
[1]
Closing the
Drost, Feike C and Werker, Bas JM , journal=. Closing the. 1996 , publisher=
1996
-
[2]
Journal of Financial Economics , volume=
Cross-stock momentum and factor momentum , author=. Journal of Financial Economics , volume=. 2023 , publisher=
2023
-
[3]
Available at SSRN 5070808 , year=
How Much of Cross-Stock Momentum Reflects Underreaction? , author=. Available at SSRN 5070808 , year=
-
[4]
Available at SSRN 4412580 , year=
Can ChatGPT forecast stock price movements? Return predictability and large language models , author=. Available at SSRN 4412580 , year=
-
[5]
Available at SSRN 4416687 , year=
Expected returns and large language models , author=. Available at SSRN 4416687 , year=
-
[6]
BloombergGPT: A Large Language Model for Finance
BloombergGPT: A large language model for finance , author=. arXiv preprint arXiv:2303.17564 , year=
work page internal anchor Pith review arXiv
-
[7]
From Text to Alpha: Can
Choi, Chanyeol and Kim, Yoon and Yu, Yu and Cha, Young and Golkhou, V Zach and Halperin, Igor and Papaioannou, Georgios and Kim, Minkyu and Wang, Zhangyang and Kwon, Jihoon and others , journal=. From Text to Alpha: Can
-
[8]
Advances in Neural Information Processing Systems , volume=
Finben: A holistic financial benchmark for large language models , author=. Advances in Neural Information Processing Systems , volume=
-
[9]
Structured Beliefs and Fund Performance: An
Gao, Zhenyu and Xiong, Wei and Yuan, Jian , journal=. Structured Beliefs and Fund Performance: An
-
[10]
Pixiu: A large language model, instruction data and evaluation benchmark for finance , author=. arXiv preprint arXiv:2306.05443 , year=
-
[11]
A Financial Brain Scan of the
Chen, Hui and Didisheim, Antoine and Somoza, Luciano and Tian, Hanqing , journal=. A Financial Brain Scan of the
-
[12]
Journal of Financial Economics , volume=
Dual Peer Effects and Cross-Stock Predictability , author=. Journal of Financial Economics , volume=
-
[13]
On cross-stock predictability of peer return gaps in
Chen, Yilin and Fan, Zheqi , journal=. On cross-stock predictability of peer return gaps in. 2026 , publisher=
2026
-
[14]
The Journal of Finance , volume=
Economic links and predictable returns , author=. The Journal of Finance , volume=. 2008 , publisher=
2008
-
[15]
The Journal of Finance , volume=
Do industries explain momentum? , author=. The Journal of Finance , volume=. 1999 , publisher=
1999
-
[16]
The Review of Financial Studies , volume=
Geographic lead-lag effects , author=. The Review of Financial Studies , volume=. 2020 , publisher=
2020
-
[17]
Journal of Financial Economics , volume=
Shared analyst coverage: Unifying momentum spillover effects , author=. Journal of Financial Economics , volume=. 2020 , publisher=
2020
-
[18]
Management Science , volume=
Empirical investigation of an equity pairs trading strategy , author=. Management Science , volume=. 2019 , publisher=
2019
-
[19]
The Journal of Finance , volume=
Principal portfolios , author=. The Journal of Finance , volume=. 2023 , publisher=
2023
-
[20]
Journal of Financial and Quantitative Analysis , volume=
Text-based industry momentum , author=. Journal of Financial and Quantitative Analysis , volume=. 2018 , publisher=
2018
-
[21]
The Journal of Finance , volume=
Market segmentation and cross-predictability of returns , author=. The Journal of Finance , volume=. 2010 , publisher=
2010
-
[22]
Review of Financial Studies , volume=
Jumps and stochastic volatility: Exchange rate processes implicit in deutsche mark options , author=. Review of Financial Studies , volume=. 1996 , publisher=
1996
-
[23]
Quantitative Finance , volume=
Pricing and calibration in the 4-factor path-dependent volatility model , author=. Quantitative Finance , volume=. 2025 , publisher=
2025
-
[24]
Journal of Financial Economics , volume=
The risk premia embedded in index options , author=. Journal of Financial Economics , volume=. 2015 , publisher=
2015
-
[25]
Journal of Finance , volume=
Model specification and risk premia: Evidence from futures options , author=. Journal of Finance , volume=. 2007 , publisher=
2007
-
[26]
Handbook of Economic Forecasting , volume=
Forecasting with option-implied information , author=. Handbook of Economic Forecasting , volume=. 2013 , publisher=
2013
-
[27]
Journal of Econometrics , volume=
Measuring tail risk , author=. Journal of Econometrics , volume=. 2024 , publisher=
2024
-
[28]
Annual Review of Financial Economics , volume=
Information in derivatives markets: Forecasting prices with prices , author=. Annual Review of Financial Economics , volume=
-
[29]
Journal of Economic Dynamics and Control , pages=
Merton (1976) implied jump , author=. Journal of Economic Dynamics and Control , pages=. 2025 , volume=
1976
-
[30]
Modeling the Implied Volatility Smirk in
Ye, Yifan and Fan, Zheqi and Ruan, Xinfeng , journal=. Modeling the Implied Volatility Smirk in. 2025 , publisher=
2025
-
[31]
Annual Review of Financial Economics , volume=
Empirical option pricing models , author=. Annual Review of Financial Economics , volume=. 2022 , publisher=
2022
-
[32]
Journal of Financial Economics , volume=
Levy jump risk: Evidence from options and returns , author=. Journal of Financial Economics , volume=. 2014 , publisher=
2014
-
[33]
Post-'87 crash fears in the
Bates, David S , journal =. Post-'87 crash fears in the. 2000 , number =
2000
-
[34]
Transform analysis and asset pricing for affine jump-diffusions , year =
Duffie, Darrell and Pan, Jun and Singleton, Kenneth , journal =. Transform analysis and asset pricing for affine jump-diffusions , year =
-
[35]
Journal of Financial and Quantitative Analysis , volume=
Fast filtering with large option panels: Implications for asset pricing , author=. Journal of Financial and Quantitative Analysis , volume=. 2024 , publisher=
2024
-
[36]
A tale of two option markets: Pricing kernels and volatility risk , year =
Song, Zhaogang and Xiu, Dacheng , journal =. A tale of two option markets: Pricing kernels and volatility risk , year =
-
[37]
Specification analysis of option pricing models based on time-changed
Huang, Jing-Zhi and Wu, Liuren , journal=. Specification analysis of option pricing models based on time-changed. 2004 , publisher=
2004
-
[38]
European Journal of Operational Research , volume=
Smiles & smirks: Volatility and leverage by jumps , author=. European Journal of Operational Research , volume=. 2022 , publisher=
2022
-
[39]
Assessing models of individual equity option prices , year =
Bakshi, Gurdip and Cao, Charles and Zhong, Zhaodong , journal =. Assessing models of individual equity option prices , year =
-
[40]
Review of Financial Studies , volume=
A closed-form solution for options with stochastic volatility with applications to bond and currency options , author=. Review of Financial Studies , volume=. 1993 , publisher=
1993
-
[41]
Option valuation using the fast
Carr, Peter and Madan, Dilip , journal=. Option valuation using the fast
-
[42]
A novel pricing method for
Fang, Fang and Oosterlee, Cornelis W , journal=. A novel pricing method for
-
[43]
Efficient options pricing using the fast
Kwok, Yue Kuen and Leung, Kwai Sun and Wong, Hoi Ying , booktitle=. Efficient options pricing using the fast. 2011 , publisher=
2011
-
[44]
2003 , issn =
The dynamics of stochastic volatility: Evidence from underlying and options markets , journal =. 2003 , issn =
2003
-
[45]
2003 , issn =
Alternative models for stock price dynamics , journal =. 2003 , issn =
2003
-
[46]
Journal of Banking and Finance , volume =
Volatility dynamics for the. Journal of Banking and Finance , volume =. 2012 , issn =
2012
-
[47]
Journal of Business & Economic Statistics , volume=
Empirical analysis of affine versus nonaffine variance specifications in jump-diffusion models for equity indices , author=. Journal of Business & Economic Statistics , volume=. 2015 , publisher=
2015
-
[48]
Jump and volatility dynamics for the
Yang, Hanxue and Kanniainen, Juho , journal=. Jump and volatility dynamics for the. 2017 , publisher=
2017
-
[49]
Journal of Economic Dynamics and Control , volume =
Model Complexity and Out-of-Sample Performance: Evidence from. Journal of Economic Dynamics and Control , volume =. 2018 , issn =
2018
-
[50]
Journal of Finance , volume=
Option profit and loss attribution and pricing: A new framework , author=. Journal of Finance , volume=. 2020 , publisher=
2020
-
[51]
Interpretability in deep learning for finance: A case study for the
Brigo, Damiano and Huang, Xiaoshan and Pallavicini, Andrea and de Oc. Interpretability in deep learning for finance: A case study for the. Risk Sciences , volume=. 2026 , publisher=
2026
-
[52]
Chronologically consistent large language models.arXiv preprint arXiv:2502.21206,
Chronologically consistent large language models , author=. arXiv preprint arXiv:2502.21206 , year=
-
[53]
Journal of Financial and Quantitative Analysis , volume=
Machine learning and the stock market , author=. Journal of Financial and Quantitative Analysis , volume=. 2023 , publisher=
2023
-
[54]
Journal of Financial and Quantitative Analysis , volume=
Is technical analysis in the foreign exchange market profitable? A genetic programming approach , author=. Journal of Financial and Quantitative Analysis , volume=. 1997 , publisher=
1997
-
[55]
Quantitative Finance , volume=
Technical analysis as a sentiment barometer and the cross-section of stock returns , author=. Quantitative Finance , volume=. 2023 , publisher=
2023
-
[56]
Xu, Yongxin and Xuan, Yuhao and Zheng, Gaoping , journal=
-
[57]
Available at SSRN 6489158 , year=
Deep surrogate for non-affine stochastic volatility option valuation models , author=. Available at SSRN 6489158 , year=
-
[58]
Available at SSRN 4802345 , year=
Trading Volume Alpha , author=. Available at SSRN 4802345 , year=
-
[59]
Available at SSRN 5974814 , year=
On Options-Driven Realized Volatility Forecasting: Information Gains via Rough Volatility Model , author=. Available at SSRN 5974814 , year=
-
[60]
Journal of Portfolio Management , volume=
Large language models for financial and investment management: Applications and benchmarks , author=. Journal of Portfolio Management , volume=. 2024 , publisher=
2024
-
[61]
, author=
Large Language Models for Financial and Investment Management: Models, Opportunities, and Challenges. , author=. Journal of Portfolio Management , volume=
-
[62]
Econometrica , volume=
A simple, positive semi-definite, heteroskedasticity and autocorrelationconsistent covariance matrix , author=. Econometrica , volume=
-
[63]
Journal of Financial Markets , pages=
Technical indicators and the cross-section of corporate bond returns in a machine learning era , author=. Journal of Financial Markets , pages=. 2025 , publisher=
2025
-
[64]
The Journal of Finance , volume=
Presidential address: The scientific outlook in financial economics , author=. The Journal of Finance , volume=. 2017 , publisher=
2017
-
[65]
Chen, Jian and Tang, Guohao and Zhou, Guofu and Zhu, Wu , journal=
-
[66]
Generative
Chai, Bailin and Jiang, Fuwei and Meng, Lingchao and You, Tian and Zhou, Guofu , journal=. Generative
-
[67]
European Journal of Operational Research , volume=
Option valuation under no-arbitrage constraints with neural networks , author=. European Journal of Operational Research , volume=. 2021 , publisher=
2021
-
[68]
Journal of Finance , volume=
A nonparametric approach to pricing and hedging derivative securities via learning networks , author=. Journal of Finance , volume=. 1994 , publisher=
1994
-
[69]
2016 , publisher=
Deep Learning , author=. 2016 , publisher=
2016
-
[70]
Journal of Financial Econometrics , volume=
Forecasting value-at-risk using deep neural network quantile regression , author=. Journal of Financial Econometrics , volume=. 2024 , publisher=
2024
-
[71]
Inferring volatility dynamics and risk premia from the
Bardgett, Chris and Gourier, Elise and Leippold, Markus , journal=. Inferring volatility dynamics and risk premia from the. 2019 , publisher=
2019
-
[72]
Journal of Financial Data Science , volume=
Explainable machine learning models of consumer credit risk , author=. Journal of Financial Data Science , volume=
-
[73]
Journal of Portfolio Management , volume=
Increasing the transparency of pricing dynamics in the US commercial real estate market with interpretable machine learning algorithms , author=. Journal of Portfolio Management , volume=
-
[74]
Explainable
Misheva, Branka Hadji and Osterrieder, Joerg and Hirsa, Ali and Kulkarni, Onkar and Lin, Stephen Fung , journal=. Explainable
-
[75]
Journal of FinTech , volume=
On deep calibration of (rough) stochastic volatility models , author=. Journal of FinTech , volume=. 2025 , publisher=
2025
-
[76]
Journal of Financial Data Science , volume=
Deep Calibration with Artificial Neural Network: A Performance Comparison on Option-Pricing Models , author=. Journal of Financial Data Science , volume=. 2023 , publisher=
2023
-
[77]
Available at SSRN 4607397 , year=
A new closed-form discrete-time option pricing model with stochastic volatility , author=. Available at SSRN 4607397 , year=
-
[78]
Available at SSRN , volume=
Pricing and calibration in the 4-factor path-dependent volatility model , author=. Available at SSRN , volume=
-
[79]
Annals of Operations Research , pages=
Forecasting gold price with the XGBoost algorithm and SHAP interaction values , author=. Annals of Operations Research , pages=. 2021 , publisher=
2021
-
[80]
Review of Quantitative Finance and Accounting , pages=
Bankruptcy prediction using machine learning and Shapley additive explanations , author=. Review of Quantitative Finance and Accounting , pages=. 2023 , publisher=
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.