Recognition: 2 theorem links
· Lean TheoremRolling-Origin Conformal Prediction under Local Stationarity and Weak Dependence
Pith reviewed 2026-05-12 01:03 UTC · model grok-4.3
The pith
Rolling-origin conformal prediction attains minimax-optimal coverage rates for time series by tuning the calibration window to local stationarity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rolling-origin conformal prediction calibrates the conformal quantile on the m most recent pseudo-out-of-sample forecast errors. Under Hölder-β local stationarity and α-mixing, a four-term coverage-error decomposition yields the optimal calibration window m⋆ ≍ T^{2β/(2β+1)} and coverage-error rate O(T^{-β/(2β+1)}). This rate is minimax optimal, as shown by a Le Cam two-point construction. The Bahadur representation is proved under both α-mixing and physical dependence, and an oracle inequality justifies data-driven window selection via Winkler cross-validation.
What carries the argument
The rolling-origin calibration window of length m that selects the m most recent pseudo-out-of-sample forecast errors for quantile estimation, which enables the four-term coverage-error decomposition.
If this is right
- The coverage error shrinks at the faster rate O(T^{-β/(2β+1)}) compared with full-history calibration.
- Winkler cross-validation provides an adaptive, oracle-efficient selector for m that does not require knowledge of β.
- Empirical coverage remains within ±2 percent of the nominal level at short and medium horizons on real series.
- The rolling procedure outperforms full-history calibration in 86 percent of the tested series with median Winkler-score improvement of 12.3 percent.
Where Pith is reading between the lines
- The minimax optimality result implies that no substantially different calibration strategy can beat this rate inside the same smoothness class.
- The cross-frequency regression slope near 2/3 in the empirical study suggests that real data often behave as if governed by the Hölder-β model.
- The same four-term decomposition technique may extend to other nonconformity scores or to multivariate forecasting problems.
- The framework indicates that local adaptation is necessary for conformal methods to retain valid coverage under distributional drift.
Load-bearing premise
The time series must satisfy Hölder-β local stationarity together with α-mixing or physical dependence, without which the coverage-error decomposition and optimal-rate result do not apply.
What would settle it
Simulate a Hölder-β locally stationary α-mixing process with known β, apply rolling-origin calibration at m near T^{2β/(2β+1)}, and verify that the empirical coverage deviation decays exactly at rate T^{-β/(2β+1)} rather than faster or slower.
Figures
read the original abstract
We propose and analyse rolling-origin conformal prediction for time-series forecasting. The method calibrates the conformal quantile against the $m$ most recent pseudo-out-of-sample forecast errors, adapting to serial dependence, volatility clustering, and distributional drift that invalidate classical conformal guarantees. Under H\"{o}lder-$\beta$ local stationarity and $\alpha$-mixing, we establish a four-term coverage-error decomposition and derive the optimal calibration window $m^{\star} \asymp T^{2\beta/(2\beta+1)}$ with coverage-error rate $O(T^{-\beta/(2\beta+1)})$. A Le Cam two-point construction shows this rate is minimax-optimal over the H\"{o}lder-$\beta$ model class. The Bahadur representation is proved under both $\alpha$-mixing and the physical-dependence framework of Wu (2005). An oracle inequality formalises Winkler cross-validation as an adaptive window selector; the required uniform concentration condition is established in an appendix. Validation on six real series and 93 M4 competition series confirms the theory: rolling-origin calibration outperforms full-history calibration in 86\% of comparisons (median Winkler improvement 12.3\%), maintains coverage within $\pm2\%$ of the 90\% target at short and medium horizons, and the cross-frequency log-log regression slope $0.614$ ($95\%$ CI $[0.424, 0.805]$) is consistent with the theoretical $2/3$ after controlling for frequency fixed effects.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes rolling-origin conformal prediction for time-series forecasting, calibrating the conformal quantile on the m most recent pseudo-out-of-sample forecast errors to handle serial dependence, volatility clustering, and distributional drift. Under Hölder-β local stationarity and α-mixing (or physical dependence), it establishes a four-term coverage-error decomposition, derives the optimal calibration window m⋆ ≍ T^{2β/(2β+1)} yielding coverage-error rate O(T^{-β/(2β+1)}), proves this rate is minimax-optimal via a Le Cam two-point construction over the Hölder-β class, provides a Bahadur representation, an oracle inequality for Winkler cross-validation as an adaptive selector, and reports empirical validation on six real series plus 93 M4 series where rolling-origin calibration outperforms full-history in 86% of cases with median Winkler improvement 12.3% and coverage within ±2% of target.
Significance. If the four-term decomposition is complete and the interaction terms are controlled, the work supplies the first optimal-rate theory for adaptive conformal calibration in locally stationary weakly dependent series, together with a practical oracle inequality and strong empirical support on competition data. The dual Bahadur proofs (α-mixing and Wu physical dependence) and the explicit minimax lower bound are notable strengths that would make the result a reference point for time-series conformal methods.
major comments (3)
- [four-term coverage-error decomposition] The four-term coverage-error decomposition (abstract and main theoretical result): the balancing argument for m⋆ ≍ T^{2β/(2β+1)} assumes the four terms (non-stationarity bias, quantile variance, mixing covariance, Bahadur remainder) are the only contributions of order T^{-β/(2β+1)}. An interaction of order (m/T)^β · α(m) between the local-stationarity drift and the α-mixing coefficients must be shown to be o(T^{-β/(2β+1)}) under the chosen m⋆; otherwise the claimed rate is not guaranteed. The proof sketch should explicitly bound or absorb this cross term.
- [Le Cam two-point construction] Le Cam two-point construction (minimax lower bound): both hypotheses in the construction must lie inside the same α-mixing class (with the same mixing rate) as the upper-bound model; otherwise the lower bound applies to a strictly larger function class than the one for which the upper bound is proved, weakening the minimax statement.
- [appendix on uniform concentration] Uniform concentration condition for the oracle inequality (appendix): the condition is invoked to justify Winkler cross-validation as an adaptive selector, but it must be verified to hold uniformly over the rolling-origin windows of length m under the joint Hölder-β and α-mixing assumptions; a counter-example or explicit rate would clarify whether the oracle inequality is sharp.
minor comments (2)
- [empirical validation] The cross-frequency log-log regression reports slope 0.614 with 95% CI [0.424, 0.805] claimed to be consistent with the theoretical 2/3; the regression specification (which frequencies, how many series per frequency, fixed effects) should be stated explicitly so readers can assess the power of the consistency check.
- [Bahadur representation] Notation for the physical-dependence coefficients (Wu 2005) is introduced alongside α-mixing; a short comparison table or remark clarifying when one framework yields strictly stronger or weaker rates than the other would improve readability.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. The comments highlight important technical points that will improve the clarity and rigor of the theoretical results. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: The four-term coverage-error decomposition (abstract and main theoretical result): the balancing argument for m⋆ ≍ T^{2β/(2β+1)} assumes the four terms (non-stationarity bias, quantile variance, mixing covariance, Bahadur remainder) are the only contributions of order T^{-β/(2β+1)}. An interaction of order (m/T)^β · α(m) between the local-stationarity drift and the α-mixing coefficients must be shown to be o(T^{-β/(2β+1)}) under the chosen m⋆; otherwise the claimed rate is not guaranteed. The proof sketch should explicitly bound or absorb this cross term.
Authors: We agree that an explicit bound on the cross term between the Hölder drift and the mixing coefficients is necessary to confirm that it does not affect the claimed rate. In the current proof of Theorem 1 we bound the four main terms separately and invoke the α-mixing decay to control dependence, but we did not isolate this particular product. Under the maintained assumption that α(k) decays at least polynomially with exponent greater than 1, the product (m/T)^β α(m) with m ≍ T^{2β/(2β+1)} is of strictly smaller order than T^{-β/(2β+1)}. We will insert a short lemma that isolates and bounds this interaction term, thereby completing the decomposition argument. revision: yes
-
Referee: Le Cam two-point construction (minimax lower bound): both hypotheses in the construction must lie inside the same α-mixing class (with the same mixing rate) as the upper-bound model; otherwise the lower bound applies to a strictly larger function class than the one for which the upper bound is proved, weakening the minimax statement.
Authors: Both hypotheses in the Le Cam construction are generated from the same base process that satisfies the α-mixing condition with identical decay rate; the local perturbation used to create the two points is supported on a vanishing fraction of the sample and does not change the mixing coefficients. We will add an explicit sentence in the proof of the lower bound (Section 4) stating that the mixing rate is held fixed across the two hypotheses, thereby ensuring the lower bound applies to exactly the same function class used for the upper bound. revision: yes
-
Referee: Uniform concentration condition for the oracle inequality (appendix): the condition is invoked to justify Winkler cross-validation as an adaptive selector, but it must be verified to hold uniformly over the rolling-origin windows of length m under the joint Hölder-β and α-mixing assumptions; a counter-example or explicit rate would clarify whether the oracle inequality is sharp.
Authors: Appendix C already derives the uniform concentration under the joint Hölder-β and α-mixing assumptions, but the uniformity is stated with respect to a fixed window length. To make the argument fully rigorous for the rolling-origin setting we will add an explicit rate (of order (log m / m)^{1/2} plus a mixing remainder) that holds uniformly over all windows of length m. This rate is sufficient for the oracle inequality to remain sharp; no counter-example arises under the maintained conditions. revision: partial
Circularity Check
No significant circularity: theoretical rates derived from external assumptions
full rationale
The paper establishes a four-term coverage-error decomposition under the stated Hölder-β local stationarity and α-mixing conditions, then balances the resulting terms to obtain the optimal m⋆ ≍ T^{2β/(2β+1)} and rate O(T^{-β/(2β+1)}). The Le Cam minimax construction is performed directly over the same model class. The Bahadur representation invokes the external Wu (2005) physical-dependence framework, and the oracle inequality for Winkler cross-validation is proved in the appendix under uniform concentration. No claimed result reduces by construction to a fitted parameter, self-citation, or renamed input; the empirical comparisons on real series are presented separately as validation. The derivation chain is therefore self-contained against the external mixing and stationarity benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Hölder-β local stationarity
- domain assumption α-mixing (or physical dependence of Wu 2005)
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearUnder Hölder-β local stationarity and α-mixing, we establish a four-term coverage-error decomposition and derive the optimal calibration window m⋆ ≍ T^{2β/(2β+1)} with coverage-error rate O(T^{-β/(2β+1)}).
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearA Le Cam two-point construction shows this rate is minimax-optimal over the Hölder-β model class.
Reference graph
Works this paper leans on
- [1]
-
[2]
Annals of Mathematical Statistics42: 1957–1961
A new proof of the Bahadur representation of quantiles and an application. Annals of Mathematical Statistics42: 1957–1961. Gibbs I, Cand` es E
work page 1957
-
[3]
Gy¨ orfi L, Kohler M, Krzy˙ zak A, Walk H
Adaptive conformal inference under distribution shift.Advances in Neural Information Processing Systems34: 1660–1672. Gy¨ orfi L, Kohler M, Krzy˙ zak A, Walk H. 2002.A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer: New York. 28 Kiefer J
work page 2002
- [4]
-
[5]
Covariance inequalities for strongly mixing processes.Annales de l’Institut Henri Poincar´ e29: 587–597. Rio E. 2017.Asymptotic Theory of Weakly Dependent Random Processes. Probability Theory and Stochastic Modelling
work page 2017
-
[6]
International Journal of Forecasting16: 437–450
Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting16: 437–450. Tsybakov AB. 2009.Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer: New York. Vogt M
work page 2009
-
[7]
Nonparametric regression for locally stationary time series.Annals of Statistics 40: 2601–2633. Vovk V, Gammerman A, Shafer G. 2005.Algorithmic Learning in a Random World. Springer: New York. Wendler M
work page 2005
-
[8]
A decision theoretic approach to interval estimation.Journal of the American Statistical Association67: 187–191. Wu WB. 2005a. On the Bahadur representation of sample quantiles for dependent sequences. Annals of Statistics33: 1934–1957. Wu WB. 2005b. Nonlinear system theory: another look at dependence.Proceedings of the National Academy of Sciences102: 14...
work page 1934
-
[9]
For heavy-tailed oracle scores where |Yj| ≤ 1 fails, Rio’s Fuk–Nagaev inequality (Rio,
for the variance bound and β > 2 for the polynomial tail in the large-deviation argument; both hold for ARMA, GARCH, and tvARCH processes (Vogt, 2012; Fryzlewicz & Subba Rao, 2011). For heavy-tailed oracle scores where |Yj| ≤ 1 fails, Rio’s Fuk–Nagaev inequality (Rio,
work page 2012
-
[10]
extends the argument under the DMR condition R 1 0 α−1(u)Q2(u) du <∞ . Under the physical-dependence stability condition of Wu (2005a), the near-stationarity restriction m = o(T ) can be relaxed using Zhou & Wu (2009), Theorem
work page 2009
-
[11]
ChoosingAso thatC 1A2 >2 givesK 1−C1A2 T →0, establishing (14)
Substituting, P sup m∈MT |WT (m)− R T (m)|> x T Fcal ≤2K 1−C1A2 T . ChoosingAso thatC 1A2 >2 givesK 1−C1A2 T →0, establishing (14). Step 4: Rate relative to R⋆ T.By Assumption 6, nval ≥cT , so p (logK T )/nval ≤C p (logK T )/T . By Assumption 7, logK T = o(T 1/(2β+1)), equivalently p (logK T )/T = o(T −β/(2β+1)), giv- ing (15). Remark 12(Sharpness).The ra...
work page 2007
-
[12]
At β = 1 the moment condition p > 6 34 is satisfied by stationary GARCH(1,1) models under standard parameter restrictions (Carrasco & Chen, 2002). 35
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.