pith. sign in

arxiv: 2606.09473 · v1 · pith:6L5ZGVIQnew · submitted 2026-06-08 · 📊 stat.ML · cs.LG

Report the Floor: A Training-Free Conformal Interval Is a Mandatory Baseline for Probabilistic Time-Series Forecasting

Pith reviewed 2026-06-27 14:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords probabilistic forecastingconformal predictiontime seriesbaselinesone-step forecastingWinkler scorecalibrationnaive forecaster
0
0 comments X

The pith

A training-free conformal interval based on the last value is a mandatory baseline for one-step probabilistic time-series forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that wrapping a last-value forecast inside a split-conformal quantile of residuals produces an interval that beats value-quantile baselines, the full NPTS family, and the published Conformal Seasonal Pools method across thousands of real series. This zero-parameter construction requires no training yet delivers coverage and Winkler scores that match simpler learned conformal predictors and exceed those of a trained neural forecaster on the same data. The advantage holds only for one-step online forecasts; at multi-step seasonal horizons the picture reverses and seasonal pools become stronger. The authors therefore conclude that any new learned forecaster must demonstrate improvement over this floor before its gains can be considered established.

Core claim

The central claim is that the ConformalNaive interval, formed by taking the last observed value as the point forecast and adding the finite-sample split-conformal quantile of its residuals, is a far stronger baseline than its near-absence from recent comparisons would suggest, and that any learned probabilistic forecaster claiming improvement must be evaluated against it in the one-step-ahead online regime.

What carries the argument

The ConformalNaive interval: a parameter-free split-conformal wrapper around the naive last-value point forecast that uses the empirical quantile of past residuals to set the interval width.

If this is right

  • Any published gain over naive or NPTS baselines must now also be shown to surpass the conformal naive floor in one-step settings.
  • ConformalNaive covers the truth 84-85 percent of the time at nominal 95 percent on the six DeepNPTS datasets, exceeding the neural forecaster's 66 percent.
  • Adaptive-online and ensemble methods remain ahead by 9-33 percent relative Winkler, confirming that tracking distribution shift still adds value.
  • ConformalNaive+ selects the better of the random-walk and seasonal floors at each horizon while restoring proper coverage.
  • The performance ordering inverts at multi-step seasonal horizons, with the random-walk floor becoming weakest and seasonal pools winning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many existing comparisons that omitted this floor may need re-running to check whether reported improvements survive the stronger baseline.
  • The horizon-dependent reversal suggests that practical systems could default to the simple floor for short horizons and switch only when longer forecasts are required.
  • The same training-free construction could be tested as a baseline in related tasks such as probabilistic anomaly detection or interval forecasting under concept drift.
  • If the one-step regime is accepted as decisive, then future work should report both one-step and multi-step results to avoid cherry-picking the favorable horizon.

Load-bearing premise

One-step-ahead online forecasting on the chosen public datasets with the Winkler score is the primary and decisive test for whether new probabilistic forecasters have made real progress.

What would settle it

A new learned forecaster that fails to improve the median relative Winkler score over ConformalNaive on a majority of the 2217 series from the nine listed sources while maintaining nominal coverage would falsify the mandatory-baseline claim.

Figures

Figures reproduced from arXiv: 2606.09473 by Valery Manokhin.

Figure 1
Figure 1. Figure 1: ConformalNaive win rate against every comparator, one step ahead, 2 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean CRPS rank, multi-step seasonal regime (lower is better). Our three methods in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical 95% coverage, multi-step seasonal regime (target 0 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Horizon sweep (four hourly datasets, H = 24). Top: CRPS normalized to Confor￾malNaive at h = 1. The floors cross over at h ≈ 2–3; ConformalNaive+ (gold) tracks the better floor; the dotted line is the per-window oracle envelope. Bottom: coverage. ConformalNaive (red) collapses mid-horizon; ConformalNaive+ and the seasonal floor stay near nominal. 2 4 6 8 10 12 14 Normalized CRPS (1.0 = C o nfo r m alN aiv … view at source ↗
read the original abstract

Probabilistic forecasters are increasingly learned, yet the baselines they are compared against are often weak or omitted. We show that the simplest possible conformal interval - a last-value point forecast wrapped in a finite-sample split-conformal residual quantile, with no parameters and no training - is a far stronger baseline than its near-total absence from recent learned-forecasting and conformal-time-series comparisons would suggest. In one-step-ahead online forecasting across 2,217 real series from nine public sources (Monash, LOTSA, the LTSF traffic/electricity/weather suites, METR-LA, BOOM, nips/probts), this ConformalNaive interval decisively beats the naive value-quantile baselines, the entire NPTS family (NPTS 73%, SeasonalNPTS 64% of series), and the published Conformal Seasonal Pools (CSP) method (71% of series, bootstrap 95% CI [69,73], paired Wilcoxon p approx 7.6e-135); it is on par with the simpler learned conformal predictors (RCI, quantile regression; median relative Winkler within 2%) and is beaten only by the adaptive-online and ensemble methods (SPCI, ACI, AgACI), which track distribution shift and lead by 9-33% relative Winkler. It is also better calibrated than a trained neural forecaster: on the six datasets that introduced DeepNPTS, the trivial floors cover the truth 84-85% of the time at a nominal 95%, versus DeepNPTS's 66%. At multi-step seasonal horizons the picture inverts: the random-walk floor is the weakest method and the seasonal pool (CSP) wins - a boundary we map. Finally we give ConformalNaive+, a one-line, training-free, horizon-adaptive selector that attains the better of two complementary floors at every horizon with restored coverage. We argue the matching conformal naive floor must be a mandatory baseline whenever a learned probabilistic forecaster claims gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that ConformalNaive—a training-free split-conformal interval formed by taking a last-value point forecast and wrapping it with finite-sample residual quantiles—is a far stronger baseline than its near-absence in recent literature would suggest. On 2,217 real series from nine public sources in the one-step-ahead online regime, it beats naive value-quantile baselines, the full NPTS family, and the published CSP method (71% of series, paired Wilcoxon p≈7.6e-135), matches simpler learned conformal predictors within 2% median relative Winkler score, and achieves 84-85% coverage versus DeepNPTS’s 66% at nominal 95%. Performance inverts at multi-step seasonal horizons (CSP wins); the authors therefore introduce the one-line ConformalNaive+ selector that restores the better coverage at every horizon. They conclude that the matching conformal naïve floor must be reported as a mandatory baseline for any learned probabilistic forecaster claiming gains.

Significance. If the empirical comparisons hold, the work would materially raise the standard for baseline reporting in probabilistic time-series forecasting by showing that a completely parameter-free conformal construction can outperform or match a range of published methods on a large, multi-source corpus with paired statistical tests. The scale (2,217 series), the explicit calibration comparison against DeepNPTS, the mapping of the horizon-dependent inversion, and the provision of an adaptive one-line selector are concrete strengths that could influence benchmark design. The absence of any learned parameters or training also makes the baseline immediately reproducible.

major comments (2)
  1. [Abstract / Conclusion] Abstract and concluding section: the assertion that ConformalNaive “must be a mandatory baseline whenever a learned probabilistic forecaster claims gains” is load-bearing on the premise that one-step-ahead results are the decisive regime. The manuscript itself documents that the ranking inverts at multi-step seasonal horizons (CSP wins), yet supplies no additional argument for why the one-step online setting should alone determine field-wide mandatory baselines.
  2. [Methods] Methods / experimental protocol: the precise adaptation of split-conformal to non-exchangeable residuals and the exact data-exclusion / calibration-window rules used in the online one-step regime are not stated with sufficient detail to reproduce the reported coverage numbers (84-85% vs. DeepNPTS 66%). Because these numbers are used to support the calibration advantage, the construction must be fully specified.
minor comments (1)
  1. [Results] The bootstrap 95% CI [69,73] for the 71% win rate against CSP should state the number of bootstrap replicates and whether series-level or aggregate resampling was performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important points on the scope of our baseline recommendation and the need for greater methodological transparency. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract / Conclusion] Abstract and concluding section: the assertion that ConformalNaive “must be a mandatory baseline whenever a learned probabilistic forecaster claims gains” is load-bearing on the premise that one-step-ahead results are the decisive regime. The manuscript itself documents that the ranking inverts at multi-step seasonal horizons (CSP wins), yet supplies no additional argument for why the one-step online setting should alone determine field-wide mandatory baselines.

    Authors: We agree that the manuscript documents the performance inversion at multi-step seasonal horizons and introduces ConformalNaive+ precisely to handle this. Our recommendation for a mandatory baseline is grounded in the one-step-ahead online regime, which constitutes the primary setting for the large-scale comparisons (2,217 series) and is a standard evaluation regime in much of the probabilistic forecasting literature. To strengthen the manuscript, we will revise the abstract and conclusion to explicitly qualify the mandatory-baseline claim to the one-step online setting while noting that ConformalNaive+ provides an adaptive, training-free alternative across horizons. This addresses the concern without overstating the scope. revision: yes

  2. Referee: [Methods] Methods / experimental protocol: the precise adaptation of split-conformal to non-exchangeable residuals and the exact data-exclusion / calibration-window rules used in the online one-step regime are not stated with sufficient detail to reproduce the reported coverage numbers (84-85% vs. DeepNPTS 66%). Because these numbers are used to support the calibration advantage, the construction must be fully specified.

    Authors: We acknowledge that the current description of the split-conformal adaptation and online protocol lacks the level of detail needed for exact reproduction. In the revised manuscript we will expand the Methods section with a dedicated subsection that specifies: (i) the exact handling of non-exchangeable residuals via the split-conformal quantile construction, (ii) the calibration-window length and data-exclusion rules applied in the rolling one-step online regime, and (iii) the precise implementation steps that yield the reported coverage figures. This will enable independent verification of the 84-85% versus 66% comparison. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical comparisons are independent of fitted inputs or self-definitions

full rationale

The paper advances an empirical claim that a training-free ConformalNaive interval outperforms several published baselines on 2217 series under one-step online forecasting with the Winkler score. All reported quantities (win rates, relative Winkler scores, coverage percentages, Wilcoxon p-values) are computed directly from hold-out residuals on named public datasets against independently published methods (NPTS family, CSP, RCI, etc.). No equation defines a target metric in terms of a fitted parameter that is then reused as the result; no uniqueness theorem or ansatz is smuggled via self-citation; the multi-step inversion is explicitly stated rather than hidden. The mandatory-baseline argument is therefore a policy conclusion drawn from the tabulated comparisons, not a quantity that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method introduces no fitted parameters or new entities; it rests on the standard split-conformal coverage guarantee applied to residuals of a last-value forecast.

axioms (1)
  • domain assumption Split-conformal prediction supplies approximate finite-sample coverage when residuals are exchangeable or the dependence is weak enough for the quantile to remain valid
    Invoked for the residual quantile construction in the one-step online setting.

pith-pipeline@v0.9.1-grok · 5900 in / 1343 out tokens · 31092 ms · 2026-06-27T14:44:55.983539+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 2 canonical work pages

  1. [1]

    and Rangapuram, Syama Sundar and Salinas, David and Schulz, Jasper and Stella, Lorenzo and Turkmen, Ali Caner and Wang, Yuyang , title =

    Alexandrov, Alexander and Benidis, Konstantinos and Bohlke-Schneider, Michael and Flunkert, Valentin and Gasthaus, Jan and Januschowski, Tim and Maddix, Danielle C. and Rangapuram, Syama Sundar and Salinas, David and Schulz, Jasper and Stella, Lorenzo and Turkmen, Ali Caner and Wang, Yuyang , title =. Journal of Machine Learning Research , volume =. 2020 , url =

  2. [2]

    2025 , eprint =

    Barber, Rina Foygel and Pananjady, Ashwin , title =. 2025 , eprint =

  3. [3]

    and Oreshkin, Boris N

    Challu, Cristian and Olivares, Kin G. and Oreshkin, Boris N. and Garza, Federico and Mergenthaler-Canseco, Max and Dubrawski, Artur , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =

  4. [4]

    Adaptive Conformal Inference Under Distribution Shift , year =

    Gibbs, Isaac and Cand\`. Adaptive Conformal Inference Under Distribution Shift , year =. 2106.00170 , archivePrefix =

  5. [5]

    , title =

    Gneiting, Tilmann and Balabdaoui, Fadoua and Raftery, Adrian E. , title =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =. 2007 , doi =

  6. [6]

    , title =

    Gneiting, Tilmann and Raftery, Adrian E. , title =. Journal of the American Statistical Association , volume =. 2007 , doi =

  7. [7]

    Annual Review of Statistics and Its Application , volume =

    Gneiting, Tilmann and Katzfuss, Matthias , title =. Annual Review of Statistics and Its Application , volume =. 2014 , doi =

  8. [8]

    and Hyndman, Rob J

    Godahewa, Rakshitha and Bergmeir, Christoph and Webb, Geoffrey I. and Hyndman, Rob J. and Montero-Manso, Pablo , title =. 2021 , eprint =

  9. [9]

    , title =

    Grushka-Cockayne, Yael and Jose, Victor Richmond R. , title =. International Journal of Forecasting , volume =. 2020 , doi =

  10. [10]

    Weather and Forecasting , volume =

    Hersbach, Hans , title =. Weather and Forecasting , volume =. 2000 , doi =

  11. [11]

    and Wasserman, Larry , title =

    Lei, Jing and G'Sell, Max and Rinaldo, Alessandro and Tibshirani, Ryan J. and Wasserman, Larry , title =. Journal of the American Statistical Association , volume =. 2018 , doi =

  12. [12]

    and Loeff, Nicolas and Pfister, Tomas , title =

    Lim, Bryan and Arik, Sercan \"O. and Loeff, Nicolas and Pfister, Tomas , title =. International Journal of Forecasting , volume =. 2021 , doi =

  13. [13]

    International Journal of Forecasting , volume =

    Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios , title =. International Journal of Forecasting , volume =. 2018 , doi =

  14. [14]

    International Journal of Forecasting , volume =

    Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios , title =. International Journal of Forecasting , volume =. 2020 , doi =

  15. [15]

    , title =

    Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios and Chen, Zhi and Gaba, Anil and Tsetlin, Ilia and Winkler, Robert L. , title =. International Journal of Forecasting , volume =. 2022 , doi =

  16. [16]

    and Orenstein, Paulo and Ramos, Thiago and Romano, Jo

    Oliveira, Roberto I. and Orenstein, Paulo and Ramos, Thiago and Romano, Jo. Split Conformal Prediction and Non-Exchangeable Data , journal =. 2024 , url =

  17. [17]

    and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , title =

    Oreshkin, Boris N. and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , title =. International Conference on Learning Representations , year =

  18. [18]

    Papadopoulos, K

    Papadopoulos, Harris and Proedrou, Kostas and Vovk, Vladimir and Gammerman, Alex , title =. Machine Learning: ECML 2002 , series =. 2002 , publisher =. doi:10.1007/3-540-36755-1_29 , url =

  19. [19]

    and Sheng, Zhenli and Yang, Bin , title =

    Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proceedings of the VLDB Endowment , volume =. 2024 , doi =

  20. [20]

    2023 , eprint =

    Rangapuram, Syama Sundar and Gasthaus, Jan and Stella, Lorenzo and Flunkert, Valentin and Salinas, David and Wang, Yuyang and Januschowski, Tim , title =. 2023 , eprint =

  21. [21]

    Conformalized Quantile Regression , booktitle =

    Romano, Yaniv and Patterson, Evan and Cand\`. Conformalized Quantile Regression , booktitle =. 2019 , url =

  22. [22]

    Journal of Machine Learning Research , volume =

    Shafer, Glenn and Vovk, Vladimir , title =. Journal of Machine Learning Research , volume =. 2008 , url =

  23. [23]

    2005 , doi =

    Vovk, Vladimir and Gammerman, Alex and Shafer, Glenn , title =. 2005 , doi =

  24. [24]

    Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications , series =

    Vovk, Vladimir and Shen, Jieli and Manokhin, Valery and Xie, Min-ge , title =. Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications , series =. 2017 , publisher =

  25. [25]

    Braverman Readings in Machine Learning

    Vovk, Vladimir and Nouretdinov, Ilia and Manokhin, Valery and Gammerman, Alex , title =. Braverman Readings in Machine Learning. Key Ideas from Inception to Current State , series =. 2018 , publisher =. doi:10.1007/978-3-319-99492-5_4 , url =

  26. [26]

    Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications , series =

    Vovk, Vladimir and Nouretdinov, Ilia and Manokhin, Valery and Gammerman, Alex , title =. Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications , series =. 2018 , publisher =

  27. [27]

    Neurocomputing , volume =

    Vovk, Vladimir and Nouretdinov, Ilia and Manokhin, Valery and Gammerman, Alex , title =. Neurocomputing , volume =. 2020 , doi =

  28. [28]

    Proceedings of the 38th International Conference on Machine Learning , series =

    Xu, Chen and Xie, Yao , title =. Proceedings of the 38th International Conference on Machine Learning , series =. 2021 , publisher =

  29. [29]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

    Xu, Chen and Xie, Yao , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2023 , doi =

  30. [30]

    Advances in Neural Information Processing Systems , series =

    Zhang, Jiawen and Wen, Xumeng and Zhang, Zhenwei and Zheng, Shun and Li, Jia and Bian, Jiang , title =. Advances in Neural Information Processing Systems , series =. 2024 , url =

  31. [31]

    Proceedings of the 40th International Conference on Machine Learning (ICML) , series =

    Xu, Chen and Xie, Yao , title =. Proceedings of the 40th International Conference on Machine Learning (ICML) , series =. 2023 , publisher =

  32. [32]

    Proceedings of the 39th International Conference on Machine Learning (ICML) , series =

    Zaffran, Margaux and F\'eron, Olivier and Goude, Yannig and Josse, Julie and Dieuleveut, Aymeric , title =. Proceedings of the 39th International Conference on Machine Learning (ICML) , series =. 2022 , publisher =

  33. [33]

    and Ramdas, Aaditya and Tibshirani, Ryan J

    Barber, Rina Foygel and Cand\`es, Emmanuel J. and Ramdas, Aaditya and Tibshirani, Ryan J. , title =. The Annals of Statistics , volume =

  34. [34]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Zeng, Ailing and Chen, Muxi and Zhang, Lei and Xu, Qiang , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =

  35. [35]

    Proceedings of the 41st International Conference on Machine Learning (ICML) , series =

    Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen , title =. Proceedings of the 41st International Conference on Machine Learning (ICML) , series =. 2024 , publisher =

  36. [36]

    International Journal of Forecasting , volume =

    Makridakis, Spyros and Hibon, Michele , title =. International Journal of Forecasting , volume =

  37. [37]

    2026 , eprint =

    Manokhin, Valery , title =. 2026 , eprint =

  38. [38]

    and Athanasopoulos, George , title =

    Hyndman, Rob J. and Athanasopoulos, George , title =. 2021 , url =