Do Better Volatility Forecasts Lead to Better Portfolios? Evidence from Graph Neural Networks
Pith reviewed 2026-05-21 07:19 UTC · model grok-4.3
The pith
The model with the lowest volatility forecast error, the highest stock ranking accuracy, and the highest portfolio Sharpe ratio are three different models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using weekly realized volatility for 465 S&P 500 equities from 2015-2025, the study compares Heterogeneous Autoregressive and Long Short-Term Memory baselines against GraphSAGE models built on rolling correlation, sector, and Granger-causal graphs, with and without macro regime features. The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models. Forecast accuracy, ranking quality, and portfolio performance are related but not interchangeable objectives. Graph volatility models add value only when the portfolio rule can exploit the cross-
What carries the argument
GraphSAGE models on rolling correlation, sector, and Granger-causal graphs that encode cross-sectional dependencies among equities, evaluated separately on forecast MSE, ranking accuracy, and downstream portfolio Sharpe ratio.
If this is right
- Forecast accuracy alone is not a reliable guide for selecting volatility models when the end goal is portfolio construction.
- Graph-based volatility models deliver benefits only when the portfolio rule is structured to use the relational information they provide.
- Model selection or training must be aligned with the specific downstream objective rather than with a single accuracy metric.
- Standard time-series baselines can outperform graph models on some metrics even when graph models capture additional structure.
Where Pith is reading between the lines
- Training or fine-tuning volatility models directly against a portfolio performance objective may close the gap between forecast quality and realized returns.
- Similar objective mismatches could appear in other financial applications where predictions feed into allocation or hedging decisions.
- Alternative portfolio rules that more explicitly incorporate pairwise or group dependencies could make graph models useful across a wider range of settings.
Load-bearing premise
The chosen portfolio construction rule is capable of exploiting the cross-sectional dependencies encoded in the graph-based volatility models.
What would settle it
If the single model that minimizes forecast MSE also achieves both the highest cross-sectional ranking accuracy and the highest portfolio Sharpe ratio across the test period, the claim that the three objectives are not interchangeable would be falsified.
read the original abstract
This paper tests whether graph neural networks improve realized volatility forecasts and whether those forecasts improve portfolio performance. Using weekly realized volatility for 465 S&P 500 equities from 2015-2025, Heterogeneous Autoregressive and Long Short-Term Memory baselines are compared against GraphSAGE models built on rolling correlation, sector, and Granger-causal graphs, with and without macro regime features. The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models. Forecast accuracy, ranking quality, and portfolio performance are related but not interchangeable objectives. Graph volatility models add value only when the portfolio rule can exploit the cross-sectional structure they encode.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares Heterogeneous Autoregressive (HAR), LSTM, and GraphSAGE volatility models (built on rolling-correlation, sector, and Granger-causal graphs, with and without macro regime features) on weekly realized volatility for 465 S&P 500 equities (2015-2025). It reports that the model minimizing forecast MSE, the model maximizing cross-sectional ranking accuracy, and the model maximizing portfolio Sharpe ratio are three distinct models, concluding that forecast accuracy, ranking quality, and portfolio performance are related but non-interchangeable objectives, with graph models adding value only when the portfolio rule exploits the cross-sectional dependencies they encode.
Significance. If the empirical distinctions hold after verification, the work usefully demonstrates that statistical forecast superiority does not automatically translate to economic outperformance in portfolio settings. The direct held-out comparison across three distinct objectives provides concrete evidence against assuming interchangeability, and the emphasis on portfolio-rule alignment offers a practical takeaway for volatility modeling in quantitative portfolio management.
major comments (2)
- Section 4 and the portfolio results table: the claim that graph models add value only when the portfolio rule exploits encoded cross-sectional structure requires explicit confirmation that the construction (mean-variance or volatility-scaled) actually incorporates off-diagonal covariance terms from the graph-derived matrices. An ablation of full versus diagonal-only covariance is needed; without it, Sharpe differences could stem from marginal univariate forecast gains rather than graph structure exploitation.
- Methods section: details on train/validation/test splits, the precise portfolio optimization rule (including any constraints or scaling), and statistical tests for Sharpe ratio differences are required to substantiate the central claim that the three best models differ across objectives.
minor comments (2)
- Abstract: briefly specify the exact portfolio construction rule used to generate the Sharpe ratios.
- Ensure consistent reporting of all model variants (with/without macro features) across MSE, ranking accuracy, and Sharpe metrics in the results tables.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of our methodology and helps clarify the interpretation of our results. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: Section 4 and the portfolio results table: the claim that graph models add value only when the portfolio rule exploits encoded cross-sectional structure requires explicit confirmation that the construction (mean-variance or volatility-scaled) actually incorporates off-diagonal covariance terms from the graph-derived matrices. An ablation of full versus diagonal-only covariance is needed; without it, Sharpe differences could stem from marginal univariate forecast gains rather than graph structure exploitation.
Authors: We agree that this clarification is essential. Our portfolio construction uses a volatility-scaled mean-variance optimizer that incorporates the full covariance matrix (including off-diagonal terms) derived from the graph-based volatility forecasts to capture cross-sectional dependencies. To directly address the concern and rule out univariate effects, we will add an ablation study in the revised manuscript comparing Sharpe ratios from the full covariance matrix versus a diagonal-only version. This will provide explicit evidence that performance gains arise from exploiting the graph-encoded structure. revision: yes
-
Referee: Methods section: details on train/validation/test splits, the precise portfolio optimization rule (including any constraints or scaling), and statistical tests for Sharpe ratio differences are required to substantiate the central claim that the three best models differ across objectives.
Authors: We acknowledge that these details are necessary for full reproducibility and to support our central claims. In the revised Methods section, we will explicitly describe the rolling train/validation/test splits (including specific date ranges and window sizes), provide the precise formulation of the portfolio optimization rule with all constraints and scaling procedures, and report statistical tests (e.g., bootstrap or Ledoit-Wolf tests) for Sharpe ratio differences. These additions will substantiate that the optimal models are distinct across the three objectives. revision: yes
Circularity Check
No significant circularity in empirical model comparison
full rationale
The paper is a direct empirical comparison of volatility forecasting models (HAR, LSTM, GraphSAGE variants) on held-out weekly realized volatility data for 465 equities. Central findings rest on out-of-sample metrics (MSE, ranking accuracy, Sharpe ratio) rather than any derivation chain. No equations, first-principles results, or self-citations are invoked to force predictions or uniqueness; the observation that optimal models differ across objectives follows immediately from the tabulated performance numbers without reduction to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Rolling correlation, sector, and Granger-causal graphs capture meaningful cross-sectional dependencies for volatility forecasting.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Graph volatility models add value only when the portfolio rule can exploit the cross-sectional structure they encode.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Journal of Financial Econometrics , year =
Corsi, Fulvio , title =. Journal of Financial Econometrics , year =
-
[2]
Advances in Neural Information Processing Systems , year =
Hamilton, William and Ying, Zhitao and Leskovec, Jure , title =. Advances in Neural Information Processing Systems , year =
-
[3]
Andersen, Torben G. and Bollerslev, Tim , title =. International Economic Review , year =
-
[4]
Granger, Clive W. J. , title =. Econometrica , year =
- [5]
-
[6]
Long Short-Term Memory , journal =
Hochreiter, Sepp and Schmidhuber, J. Long Short-Term Memory , journal =. 1997 , volume =
work page 1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.