arxiv: 2604.10402 · v3 · submitted 2026-04-12 · 💱 q-fin.ST · q-fin.RM

Recognition: unknown

Risk-Sensitive Specialist Routing for Volatility Forecasting

Tenghan Zhong

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:46 UTC · model grok-4.3

classification 💱 q-fin.ST q-fin.RM

keywords volatility forecastingspecialist routingmarket regimesrisk-sensitive evaluationETFforecast combinationwalk-forward validation

0 comments

The pith

A routing system that switches volatility forecasters by detected market state reduces high-volatility errors by 24 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that routes ETF volatility forecasts to different specialist models depending on whether the market is calm or stressed. It relies on online risk-sensitive evaluation to assess model performance in real time and applies state-dependent gating to select the right specialist at each step. Standard methods assume one model or a fixed combination works across all conditions, yet the data show the strongest forecaster changes sharply with volatility levels. If the routing succeeds, forecasters gain accuracy precisely when errors matter most, without needing to retrain models from scratch for every regime shift. The walk-forward results on six ETFs quantify the gains against a rolling-best baseline.

Core claim

In a daily panel of six ETFs under rolling walk-forward evaluation, the best-performing volatility forecaster is regime-dependent rather than stable across market states. The risk-sensitive specialist routing framework, which combines online risk-sensitive evaluation with state-dependent gating, reduces high-volatility forecast loss by about 24 percent and underprediction loss by about 22 percent relative to the rolling-best baseline.

What carries the argument

State-dependent gating driven by online risk-sensitive evaluation, which selects and combines forecasting specialists according to real-time market conditions.

If this is right

Volatility forecasts become more reliable during stress periods by avoiding reliance on a single underperforming model.
Risk-management systems can incorporate adaptive model selection to lower tail forecast errors.
Forecast combination methods in finance should move from static or rolling weights toward real-time regime detection.
Walk-forward testing confirms that dynamic routing beats always using the model that performed best in the most recent window.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing logic could apply to other regime-sensitive tasks such as liquidity or return forecasting.
Pairing the gating signal with additional macro indicators might further stabilize state identification.
Gains may compound if specialists themselves are allowed to adapt their parameters within each detected regime.

Load-bearing premise

Market states can be reliably distinguished in real time so that the gating mechanism selects the appropriate specialist without introducing selection bias or overfitting.

What would settle it

Applying the routing framework to a fresh panel of assets or a later time window and finding no reduction in high-volatility loss relative to the rolling-best baseline would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2604.10402 by Tenghan Zhong.

**Figure 2.** Figure 2: shows that the strongest forecaster is regimedependent. For each asset and regime, we identify the model that most often attains the lowest loss. The heatmap then reports, for each regime, how many of the six assets are won by each model. Here, All denotes the full evaluation sample. In the low-volatility regime, GRU is the winning model for all six assets. In the high-volatility regime, the strongest met… view at source ↗

**Figure 3.** Figure 3: Asset-level QLIKE comparison for the five non-TLT ETFs. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: TLT robustness comparison on a log scale. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Cross-asset median QLIKE differences by regime. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Volatility forecasting becomes challenging when market conditions shift and model performance varies across market states. Motivated by this instability, we develop a risk-sensitive specialist routing framework for ETF volatility forecasting. The framework uses online risk-sensitive evaluation and state-dependent gating to combine different forecasting specialists across calm and stressed market states. Using a daily panel of six ETFs under a rolling walk-forward design, we find that the strongest forecaster is regime-dependent rather than stable across all states. Relative to the rolling-best baseline, the proposed routing framework reduces high-volatility forecast loss by about 24% and underprediction loss by about 22%. These results suggest that specialist routing provides a practical forecasting architecture that adapts to changing market conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The routing framework reports plausible gains in ETF volatility forecasts but the walk-forward results rest on unverified assumptions about real-time state labeling and gating that could introduce bias.

read the letter

The paper combines risk-sensitive online evaluation with state-dependent gating to route between volatility forecasting specialists for ETFs, showing that the best model shifts across calm and stressed regimes. In a rolling walk-forward on six ETFs it claims roughly 24% lower high-vol loss and 22% lower underprediction loss versus a rolling-best baseline. That specific pairing of risk-sensitive evaluation and regime-aware routing in this setting is new enough to be worth noting, and the walk-forward design itself is a reasonable choice for time-series work. The concrete percentage improvements give a practitioner something tangible to look at. The central observation that no single forecaster dominates across states is also useful and aligns with what many people already suspect about volatility models. The soft spots are mostly around missing implementation details. The abstract does not spell out how market states are identified strictly from information available at the forecast origin, nor whether the gating parameters are re-estimated inside each rolling window without a separate hold-out. If either step leaks future information or fits noise in the small panel, the reported gains could be inflated. There is also no mention of error bars, statistical tests, or sensitivity checks around the risk-sensitivity parameter and gating threshold. Those omissions make it hard to judge how robust the numbers really are. This is the sort of applied forecasting paper that could interest risk managers or quant teams working with ETF volatility surfaces. A reader who wants a practical architecture for adapting to regime shifts would find it worth skimming, but someone looking for broad methodological advances or formal guarantees would not. I would send it to peer review. The empirical claim is narrow but testable, and a referee can require the authors to document the exact state-labeling rule and gating estimation procedure so the results can be reproduced or falsified cleanly.

Referee Report

3 major / 2 minor

Summary. The paper proposes a risk-sensitive specialist routing framework for volatility forecasting on ETFs. It combines multiple forecasting specialists via online risk-sensitive evaluation and state-dependent gating that switches between calm and stressed market regimes. Using a daily rolling walk-forward evaluation on six ETFs, the authors report that the routing approach reduces high-volatility forecast loss by approximately 24% and underprediction loss by approximately 22% relative to a rolling-best baseline, arguing that specialist performance is regime-dependent rather than stable.

Significance. If the routing mechanism can be shown to operate strictly online without look-ahead or selection bias, the framework offers a practical, adaptive architecture for volatility forecasting that exploits the documented instability of individual models across market states. The empirical gains, while modest in absolute terms, address a persistent challenge in financial time series where no single model dominates universally. The work would be strengthened by reproducible code or parameter-free derivations, but currently rests on empirical demonstration.

major comments (3)

[Abstract and evaluation design] The abstract and evaluation description report 24% and 22% loss reductions from a rolling walk-forward design, yet provide no explicit statement on whether state labels or the gating classifier are computed exclusively from information available at the forecast origin (t-1 or earlier). If any contemporaneous volatility signal enters the state detection step, the reported improvements could be inflated by implicit look-ahead bias.
[Methods and parameter specification] The risk sensitivity parameter and gating threshold (or state classifier parameters) are listed as free parameters, but the manuscript does not describe how these are tuned inside each rolling window or whether an inner hold-out is used to prevent overfitting the gating rule to the specific regime sequence observed in the training folds.
[Results and robustness] No error bars, bootstrap intervals, or Diebold-Mariano-style tests are mentioned for the headline percentage improvements. Without these, it is impossible to assess whether the 24% and 22% reductions are statistically distinguishable from zero or from the rolling-best baseline under the small panel of six ETFs.

minor comments (2)

[Abstract] The abstract refers to 'high-volatility forecast loss' and 'underprediction loss' without defining the exact loss functions or the volatility threshold used to label states.
[Framework description] Notation for the specialists, gating function, and risk-sensitive objective should be introduced with explicit equations rather than descriptive prose alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our paper. We address each of the major comments in detail below, clarifying our methodology and outlining the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and evaluation design] The abstract and evaluation description report 24% and 22% loss reductions from a rolling walk-forward design, yet provide no explicit statement on whether state labels or the gating classifier are computed exclusively from information available at the forecast origin (t-1 or earlier). If any contemporaneous volatility signal enters the state detection step, the reported improvements could be inflated by implicit look-ahead bias.

Authors: We thank the referee for highlighting this potential issue. Our state-dependent gating is designed to be strictly online: the regime classification relies solely on volatility estimates derived from historical returns available at the forecast origin (up to t-1). The risk-sensitive evaluation of specialists is also performed causally. We will add a clear statement in the abstract and a dedicated paragraph in the evaluation design section to explicitly confirm the absence of look-ahead bias. revision: yes
Referee: [Methods and parameter specification] The risk sensitivity parameter and gating threshold (or state classifier parameters) are listed as free parameters, but the manuscript does not describe how these are tuned inside each rolling window or whether an inner hold-out is used to prevent overfitting the gating rule to the specific regime sequence observed in the training folds.

Authors: This is a valid point regarding the transparency of our experimental setup. In the current implementation, the risk sensitivity parameter is held fixed across windows at a value determined from initial experiments, while the gating threshold is selected by optimizing performance on an inner validation fold within each rolling window. We will revise the methods section to provide a full description of this tuning process, including the specific parameter values used and the inner hold-out procedure to mitigate overfitting concerns. revision: yes
Referee: [Results and robustness] No error bars, bootstrap intervals, or Diebold-Mariano-style tests are mentioned for the headline percentage improvements. Without these, it is impossible to assess whether the 24% and 22% reductions are statistically distinguishable from zero or from the rolling-best baseline under the small panel of six ETFs.

Authors: We acknowledge the importance of statistical validation for the reported gains. We will incorporate bootstrap resampling to provide confidence intervals around the loss reduction percentages and apply Diebold-Mariano tests to compare the routing framework against the rolling-best baseline. These additions will be included in the results section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results rest on independent walk-forward evaluation.

full rationale

The paper introduces a risk-sensitive specialist routing framework with online evaluation and state-dependent gating for ETF volatility forecasting. Its central claims are empirical performance improvements (24% high-vol loss reduction, 22% underprediction loss reduction) obtained via rolling walk-forward validation against a rolling-best baseline on six ETFs. No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the framework description and results do not equate predictions to inputs via renaming or tautology. The evaluation design is external to the model's fitted values and therefore self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Framework rests on standard time-series stationarity assumptions within regimes and the existence of distinguishable market states; no new entities postulated. Free parameters such as risk-sensitivity weights and gating thresholds are likely present but unspecified in the abstract.

free parameters (2)

risk sensitivity parameter
Controls how much the online evaluator penalizes downside forecast errors; value not reported in abstract.
gating threshold or state classifier parameters
Determines calm versus stressed regime assignment; likely tuned or learned but details absent.

axioms (1)

domain assumption Market regimes exist and can be identified from observable data in real time.
Invoked by the state-dependent gating mechanism described in the abstract.

pith-pipeline@v0.9.0 · 5401 in / 1181 out tokens · 46111 ms · 2026-05-10T16:46:49.519336+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references

[1]

Autoregressive conditional heteroskedasticity with es- timates of the variance of united kingdom inflation,

R. F. Engle, “Autoregressive conditional heteroskedasticity with es- timates of the variance of united kingdom inflation,”Econometrica, vol. 50, no. 4, pp. 987–1007, 1982

1982
[2]

Modeling and forecasting realized volatility,

T. G. Andersen, T. Bollerslev, F. X. Diebold, and P. Labys, “Modeling and forecasting realized volatility,”Econometrica, vol. 71, no. 2, pp. 579–625, 2003

2003
[3]

Prediction of realized volatility and implied volatility indices using AI and machine learning: A review,

E. S. Gunnarsson, H. R. Isern, A. Kaloudis, M. Risstad, B. Vigdel, and S. Westgaard, “Prediction of realized volatility and implied volatility indices using AI and machine learning: A review,”International Review of Financial Analysis, vol. 93, p. 103221, 2024

2024
[4]

Forecasting stock market volatility with regime- switching GARCH models,

J. Marcucci, “Forecasting stock market volatility with regime- switching GARCH models,”Studies in Nonlinear Dynamics & Econo- metrics, vol. 9, no. 4, pp. 1–55, 2005

2005
[5]

Predicting the volatility of the iShares china large-cap ETF: What is the role of the SSE 50 ETF?

F. Zhu, X. Luo, and X. Jin, “Predicting the volatility of the iShares china large-cap ETF: What is the role of the SSE 50 ETF?”Pacific- Basin Finance Journal, vol. 57, p. 101192, 2019

2019
[6]

Forecasting downside and upside realized volatility: The role of asymmetric information,

D. Maki, “Forecasting downside and upside realized volatility: The role of asymmetric information,”The Journal of Economic Asymme- tries, vol. 29, p. e00357, 2024

2024
[7]

Forecasting realised volatility using regime-switching models,

Y . Ding, D. Kambouroudis, and D. G. McMillan, “Forecasting realised volatility using regime-switching models,”International Review of Economics & Finance, vol. 101, p. 104171, 2025

2025
[8]

Fractionally integrated generalized autoregressive conditional heteroskedasticity,

R. T. Baillie, T. Bollerslev, and H. O. Mikkelsen, “Fractionally integrated generalized autoregressive conditional heteroskedasticity,” Journal of Econometrics, vol. 74, no. 1, pp. 3–30, 1996

1996
[9]

A simple approximate long-memory model of realized volatility,

F. Corsi, “A simple approximate long-memory model of realized volatility,”Journal of Financial Econometrics, vol. 7, no. 2, pp. 174– 196, 2009

2009
[10]

A machine learning approach to volatility forecasting,

K. Christensen, M. V . Siggaard, and B. Veliyev, “A machine learning approach to volatility forecasting,”Journal of Financial Econometrics, vol. 21, no. 5, pp. 1680–1727, 2023

2023
[11]

Forecasting realized volatility with machine learning: Panel data perspective,

H. Zhu, L. Bai, L. He, and Z. Liu, “Forecasting realized volatility with machine learning: Panel data perspective,”Journal of Empirical Finance, vol. 73, pp. 251–271, 2023

2023
[12]

DeepV ol: V olatility forecasting from high-frequency data,

F. Moreno-Pino and S. Zohren, “DeepV ol: V olatility forecasting from high-frequency data,”Quantitative Finance, vol. 24, no. 10, pp. 1575– 1598, 2024

2024
[13]

Forecasting realized volatility: Does anything beat linear models?

R. R. Branco, A. Rubesam, and M. Zevallos, “Forecasting realized volatility: Does anything beat linear models?”Journal of Empirical Finance, vol. 78, p. 101524, 2024

2024
[14]

Forecasting realized volatility: A bayesian model-averaging approach,

C. Liu and J. M. Maheu, “Forecasting realized volatility: A bayesian model-averaging approach,”Journal of Applied Econometrics, vol. 24, no. 5, pp. 709–733, 2009

2009
[15]

Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill,

A. E. Raftery, M. K ´arn´y, and P. Ettler, “Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill,”Technometrics, vol. 52, no. 1, pp. 52–66, 2010

2010
[16]

Forecasting realised volatility: A markov switching approach with time-varying transition probabilities,

X. Wang, K. Shrestha, and Q. Sun, “Forecasting realised volatility: A markov switching approach with time-varying transition probabilities,” Accounting & Finance, vol. 59, no. S2, pp. 1947–1975, 2019

1947
[17]

A false discovery rate approach to optimal volatility forecasting model selection,

A. Hassanniakalager, P. L. Baker, and E. Platanakis, “A false discovery rate approach to optimal volatility forecasting model selection,”Inter- national Journal of Forecasting, vol. 40, no. 3, pp. 881–902, 2024

2024
[18]

Adaptive mixtures of local experts,

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural Computation, vol. 3, no. 1, pp. 79–87, 1991

1991
[19]

Hierarchical mixtures of experts and the EM algorithm,

M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,”Neural Computation, vol. 6, no. 2, pp. 181–214, 1994

1994
[20]

On the estimation of security price volatilities from historical data,

M. B. Garman and M. J. Klass, “On the estimation of security price volatilities from historical data,”Journal of Business, vol. 53, no. 1, pp. 67–78, 1980

1980
[21]

V olatility forecast comparison using imperfect volatility proxies,

A. J. Patton, “V olatility forecast comparison using imperfect volatility proxies,”Journal of Econometrics, vol. 160, no. 1, pp. 246–256, 2011

2011
[22]

Comparing predictive accuracy,

F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,” Journal of Business & Economic Statistics, vol. 13, no. 3, pp. 253– 263, 1995

1995
[23]

A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix,

W. K. Newey and K. D. West, “A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix,” Econometrica, vol. 55, no. 3, pp. 703–708, 1987

1987