Do Better Volatility Forecasts Lead to Better Portfolios? Evidence from Graph Neural Networks

Rylan Wade

arxiv: 2605.19278 · v2 · pith:63SV2GQCnew · submitted 2026-05-19 · 💱 q-fin.PM · cs.LG

Do Better Volatility Forecasts Lead to Better Portfolios? Evidence from Graph Neural Networks

Rylan Wade This is my paper

Pith reviewed 2026-05-21 07:19 UTC · model grok-4.3

classification 💱 q-fin.PM cs.LG

keywords volatility forecastinggraph neural networksportfolio performancerealized volatilitySharpe ratiocross-sectional rankingS&P 500 equitiesGraphSAGE

0 comments

The pith

The model with the lowest volatility forecast error, the highest stock ranking accuracy, and the highest portfolio Sharpe ratio are three different models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether graph neural networks improve realized volatility forecasts for stocks and whether those improved forecasts lead to better portfolio results. It compares standard autoregressive and recurrent baselines to GraphSAGE models built on correlation, sector, and causal graphs using weekly data for 465 S&P 500 stocks. The central result is that forecast accuracy, cross-sectional ranking quality, and actual portfolio performance are related but distinct objectives, so the model that wins on one metric does not win on the others. Graph-based models improve outcomes only when the portfolio rule can use the relational structure they encode. Readers care because many investment processes select or size positions using volatility predictions, yet the findings indicate that chasing lower prediction errors may not improve realized returns.

Core claim

Using weekly realized volatility for 465 S&P 500 equities from 2015-2025, the study compares Heterogeneous Autoregressive and Long Short-Term Memory baselines against GraphSAGE models built on rolling correlation, sector, and Granger-causal graphs, with and without macro regime features. The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models. Forecast accuracy, ranking quality, and portfolio performance are related but not interchangeable objectives. Graph volatility models add value only when the portfolio rule can exploit the cross-

What carries the argument

GraphSAGE models on rolling correlation, sector, and Granger-causal graphs that encode cross-sectional dependencies among equities, evaluated separately on forecast MSE, ranking accuracy, and downstream portfolio Sharpe ratio.

If this is right

Forecast accuracy alone is not a reliable guide for selecting volatility models when the end goal is portfolio construction.
Graph-based volatility models deliver benefits only when the portfolio rule is structured to use the relational information they provide.
Model selection or training must be aligned with the specific downstream objective rather than with a single accuracy metric.
Standard time-series baselines can outperform graph models on some metrics even when graph models capture additional structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training or fine-tuning volatility models directly against a portfolio performance objective may close the gap between forecast quality and realized returns.
Similar objective mismatches could appear in other financial applications where predictions feed into allocation or hedging decisions.
Alternative portfolio rules that more explicitly incorporate pairwise or group dependencies could make graph models useful across a wider range of settings.

Load-bearing premise

The chosen portfolio construction rule is capable of exploiting the cross-sectional dependencies encoded in the graph-based volatility models.

What would settle it

If the single model that minimizes forecast MSE also achieves both the highest cross-sectional ranking accuracy and the highest portfolio Sharpe ratio across the test period, the claim that the three objectives are not interchangeable would be falsified.

read the original abstract

This paper tests whether graph neural networks improve realized volatility forecasts and whether those forecasts improve portfolio performance. Using weekly realized volatility for 465 S&P 500 equities from 2015-2025, Heterogeneous Autoregressive and Long Short-Term Memory baselines are compared against GraphSAGE models built on rolling correlation, sector, and Granger-causal graphs, with and without macro regime features. The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models. Forecast accuracy, ranking quality, and portfolio performance are related but not interchangeable objectives. Graph volatility models add value only when the portfolio rule can exploit the cross-sectional structure they encode.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The main result is that lowest-MSE, highest-ranking, and highest-Sharpe volatility models are three different ones, so forecast accuracy does not automatically translate to portfolio gains.

read the letter

The punchline is that the model with the lowest forecast MSE is not the same as the one with the best cross-sectional ranking or the highest portfolio Sharpe ratio. Graph-based models only show an advantage when the portfolio rule can actually use the cross-sectional dependencies they encode from the graphs. This is the central empirical claim and the one a colleague should keep in mind when thinking about volatility model selection for trading desks. The paper runs a direct comparison on weekly realized volatility for 465 S&P 500 stocks from 2015 to 2025. It pits Heterogeneous Autoregressive and LSTM baselines against GraphSAGE models built on rolling-correlation, sector, and Granger-causal graphs, with and without macro regime inputs. The result that these three objectives diverge is useful because it challenges the common assumption that better statistical forecasts will produce better economic outcomes. What the work does well is move the evaluation past pure forecast error and test the downstream portfolio impact with multiple graph constructions. That setup makes the non-interchangeability of the objectives concrete rather than abstract. The soft spot is around the portfolio construction rule. The claim that graph models add value specifically through cross-sectional structure rests on the rule incorporating off-diagonal covariance terms derived from the graphs. If the portfolios are effectively univariate volatility scaling or equal weighting, the Sharpe differences could come from marginal forecast improvements instead. An explicit ablation comparing full versus diagonal covariance would pin this down; without it the attribution to the graph structure stays a bit loose. This paper is for quantitative portfolio managers and researchers who care about how volatility model choices affect actual position sizing and risk. A reader who has run forecast competitions and then wondered why the winner did not improve live performance will find the distinction relevant. The empirical grounding and the clear engagement with the gap between statistical and economic objectives are enough to justify sending it for peer review rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper compares Heterogeneous Autoregressive (HAR), LSTM, and GraphSAGE volatility models (built on rolling-correlation, sector, and Granger-causal graphs, with and without macro regime features) on weekly realized volatility for 465 S&P 500 equities (2015-2025). It reports that the model minimizing forecast MSE, the model maximizing cross-sectional ranking accuracy, and the model maximizing portfolio Sharpe ratio are three distinct models, concluding that forecast accuracy, ranking quality, and portfolio performance are related but non-interchangeable objectives, with graph models adding value only when the portfolio rule exploits the cross-sectional dependencies they encode.

Significance. If the empirical distinctions hold after verification, the work usefully demonstrates that statistical forecast superiority does not automatically translate to economic outperformance in portfolio settings. The direct held-out comparison across three distinct objectives provides concrete evidence against assuming interchangeability, and the emphasis on portfolio-rule alignment offers a practical takeaway for volatility modeling in quantitative portfolio management.

major comments (2)

Section 4 and the portfolio results table: the claim that graph models add value only when the portfolio rule exploits encoded cross-sectional structure requires explicit confirmation that the construction (mean-variance or volatility-scaled) actually incorporates off-diagonal covariance terms from the graph-derived matrices. An ablation of full versus diagonal-only covariance is needed; without it, Sharpe differences could stem from marginal univariate forecast gains rather than graph structure exploitation.
Methods section: details on train/validation/test splits, the precise portfolio optimization rule (including any constraints or scaling), and statistical tests for Sharpe ratio differences are required to substantiate the central claim that the three best models differ across objectives.

minor comments (2)

Abstract: briefly specify the exact portfolio construction rule used to generate the Sharpe ratios.
Ensure consistent reporting of all model variants (with/without macro features) across MSE, ranking accuracy, and Sharpe metrics in the results tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of our methodology and helps clarify the interpretation of our results. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: Section 4 and the portfolio results table: the claim that graph models add value only when the portfolio rule exploits encoded cross-sectional structure requires explicit confirmation that the construction (mean-variance or volatility-scaled) actually incorporates off-diagonal covariance terms from the graph-derived matrices. An ablation of full versus diagonal-only covariance is needed; without it, Sharpe differences could stem from marginal univariate forecast gains rather than graph structure exploitation.

Authors: We agree that this clarification is essential. Our portfolio construction uses a volatility-scaled mean-variance optimizer that incorporates the full covariance matrix (including off-diagonal terms) derived from the graph-based volatility forecasts to capture cross-sectional dependencies. To directly address the concern and rule out univariate effects, we will add an ablation study in the revised manuscript comparing Sharpe ratios from the full covariance matrix versus a diagonal-only version. This will provide explicit evidence that performance gains arise from exploiting the graph-encoded structure. revision: yes
Referee: Methods section: details on train/validation/test splits, the precise portfolio optimization rule (including any constraints or scaling), and statistical tests for Sharpe ratio differences are required to substantiate the central claim that the three best models differ across objectives.

Authors: We acknowledge that these details are necessary for full reproducibility and to support our central claims. In the revised Methods section, we will explicitly describe the rolling train/validation/test splits (including specific date ranges and window sizes), provide the precise formulation of the portfolio optimization rule with all constraints and scaling procedures, and report statistical tests (e.g., bootstrap or Ledoit-Wolf tests) for Sharpe ratio differences. These additions will substantiate that the optimal models are distinct across the three objectives. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical model comparison

full rationale

The paper is a direct empirical comparison of volatility forecasting models (HAR, LSTM, GraphSAGE variants) on held-out weekly realized volatility data for 465 equities. Central findings rest on out-of-sample metrics (MSE, ranking accuracy, Sharpe ratio) rather than any derivation chain. No equations, first-principles results, or self-citations are invoked to force predictions or uniqueness; the observation that optimal models differ across objectives follows immediately from the tabulated performance numbers without reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions about the informativeness of correlation, sector, and Granger-causal graphs for volatility without introducing new free parameters or invented entities.

axioms (1)

domain assumption Rolling correlation, sector, and Granger-causal graphs capture meaningful cross-sectional dependencies for volatility forecasting.
These graphs are used to construct the GraphSAGE models whose value is conditioned on portfolio rules exploiting the encoded structure.

pith-pipeline@v0.9.0 · 5648 in / 1376 out tokens · 41570 ms · 2026-05-21T07:19:00.412206+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Graph volatility models add value only when the portfolio rule can exploit the cross-sectional structure they encode.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Journal of Financial Econometrics , year =

Corsi, Fulvio , title =. Journal of Financial Econometrics , year =

work page
[2]

Advances in Neural Information Processing Systems , year =

Hamilton, William and Ying, Zhitao and Leskovec, Jure , title =. Advances in Neural Information Processing Systems , year =

work page
[3]

and Bollerslev, Tim , title =

Andersen, Torben G. and Bollerslev, Tim , title =. International Economic Review , year =

work page
[4]

Granger, Clive W. J. , title =. Econometrica , year =

work page
[5]

, title =

Engle, Robert F. , title =. Econometrica , year =

work page
[6]

Long Short-Term Memory , journal =

Hochreiter, Sepp and Schmidhuber, J. Long Short-Term Memory , journal =. 1997 , volume =

work page 1997

[1] [1]

Journal of Financial Econometrics , year =

Corsi, Fulvio , title =. Journal of Financial Econometrics , year =

work page

[2] [2]

Advances in Neural Information Processing Systems , year =

Hamilton, William and Ying, Zhitao and Leskovec, Jure , title =. Advances in Neural Information Processing Systems , year =

work page

[3] [3]

and Bollerslev, Tim , title =

Andersen, Torben G. and Bollerslev, Tim , title =. International Economic Review , year =

work page

[4] [4]

Granger, Clive W. J. , title =. Econometrica , year =

work page

[5] [5]

, title =

Engle, Robert F. , title =. Econometrica , year =

work page

[6] [6]

Long Short-Term Memory , journal =

Hochreiter, Sepp and Schmidhuber, J. Long Short-Term Memory , journal =. 1997 , volume =

work page 1997