arxiv: 2605.08857 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction

Manuel Heurich, Maximilian Granz, Tim Landgraf

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:24 UTC · model grok-4.3

classification 💻 cs.LG

keywords conformal predictiontime series forecastingregime detectionretrieval methodsadaptive calibrationprediction intervalsuncertainty quantificationmixture of experts

0 comments

The pith

RareCP retrieves top-k past residuals weighted by regime-specific attention experts to form tighter conformal prediction intervals for drifting time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to make conformal prediction intervals narrower for time series forecasts that face changing error patterns and drift, without losing their coverage guarantees. Current approaches either adjust rates over time or learn loose weights, but they do not explicitly separate smoothly drifting errors from distinct co-existing error regimes. RareCP trains a mixture of cosine-attention experts to identify those regimes locally, uses a hypernetwork to adapt kernel parameters for drift, and then pulls the most similar past calibration cases to compute a weighted quantile of signed residuals. The result is asymmetric intervals that adapt to the current context. A sympathetic reader would care because narrower reliable intervals directly improve decision-making in forecasting tasks where over-wide bands waste resources or hide risk.

Core claim

RareCP learns local calibration representations through a mixture of cosine-attention experts that each capture distinct error regimes, while a compact hypernetwork adapts the kernel parameters to track temporal drift. Given a new forecasting context, it retrieves the top-k most relevant calibration examples, assigns similarity weights, and forms a weighted conformal quantile over their signed residuals, yielding asymmetric prediction intervals. The adaptive kernel is trained using a smooth interval score objective with a parameter-space anchor to a lightweight teacher kernel.

What carries the argument

Mixture of cosine-attention experts that separate error regimes, paired with a hypernetwork for drift adaptation and top-k retrieval to weight residuals for the conformal quantile.

If this is right

Interval efficiency improves over recent conformal baselines and foundation-model uncertainty estimates on the GIFT-Eval benchmark.
Empirical coverage is maintained at the nominal level.
Ablations show separate gains from regime-specific experts, drift-adaptive kernels, sparse retrieval, and teacher anchoring.
The method produces asymmetric intervals that adapt to local context rather than global or sliding-window statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same retrieval-plus-experts pattern could be tested on other online learning settings where error regimes shift abruptly, such as anomaly detection streams.
If regime separation proves stable, the approach might reduce the frequency of full recalibration needed in production forecasting pipelines.
One could check whether the learned expert weights themselves serve as interpretable indicators of which error regime is active at any moment.
Applying the method to multivariate series or to settings with known external regime triggers would test how far the cosine-attention separation generalizes.

Load-bearing premise

Distinct error regimes exist in the data and the mixture of cosine-attention experts can reliably separate them so that retrieved residuals remain relevant for the weighted quantile even when drift occurs.

What would settle it

On a time series dataset with documented regime shifts, remove the expert mixture and measure whether interval width improves by less than the reported margin over baselines while empirical coverage stays at the target level.

Figures

Figures reproduced from arXiv: 2605.08857 by Manuel Heurich, Maximilian Granz, Tim Landgraf.

**Figure 2.** Figure 2: RareCP overview. A forecasting backbone provides a point prediction and history context. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Stepwise RareCP ablation on Bench10. Bars show additional percentage improvement in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of 80% intervals using the gating mixture on the Bench10 datasets. [PITH_FULL_IMAGE:figures/full_fig_p035_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of SplitCP Uniform predictions on the first 300 test samples per dataset; [PITH_FULL_IMAGE:figures/full_fig_p036_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of SplitCP Uniform predictions on the first 300 test samples per dataset; [PITH_FULL_IMAGE:figures/full_fig_p037_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of NexCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p038_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of NexCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p039_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of ACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p040_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of ACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p041_10.png] view at source ↗

**Figure 11.** Figure 11: Visualization of dtACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p042_11.png] view at source ↗

**Figure 12.** Figure 12: Visualization of dtACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p043_12.png] view at source ↗

**Figure 13.** Figure 13: Visualization of ResCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p044_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of ResCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p045_14.png] view at source ↗

**Figure 15.** Figure 15: Visualization of KOWCPI predictions on the first 300 test samples per dataset; Bench10 [PITH_FULL_IMAGE:figures/full_fig_p046_15.png] view at source ↗

**Figure 16.** Figure 16: Visualization of KOWCPI predictions on the first 300 test samples per dataset; Bench10 [PITH_FULL_IMAGE:figures/full_fig_p047_16.png] view at source ↗

**Figure 17.** Figure 17: Visualization of HopCPT predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p048_17.png] view at source ↗

**Figure 18.** Figure 18: Visualization of HopCPT predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p049_18.png] view at source ↗

**Figure 19.** Figure 19: Visualization of RareCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p050_19.png] view at source ↗

**Figure 20.** Figure 20: Visualization of RareCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p051_20.png] view at source ↗

read the original abstract

Recent advances in uncertainty quantification for time series forecasting show that conformal prediction can provide reliable prediction intervals, yet standard conformal methods are often inefficient under temporal dependence, drift, and heterogeneous error behavior. Existing methods typically either update miscoverage rates over time or learn unconstrained calibration weights, without explicitly separating two central sources of nonstationarity: smoothly drifting error distributions and co-existing distinct error regimes. We introduce RareCP, a regime-aware retrieval method for adaptive conformal time series prediction. RareCP learns local calibration representations through a mixture of cosine-attention experts that each capture distinct error regimes, while a compact hypernetwork adapts the kernel parameters to track temporal drift. Given a new forecasting context, RareCP retrieves the top-k most relevant calibration examples, assigns similarity weights, and forms a weighted conformal quantile over their signed residuals, yielding asymmetric prediction intervals. The adaptive kernel is trained using a smooth interval score objective, with a parameter-space anchor to a lightweight teacher kernel to preserve stable local representations. On the GIFT-Eval benchmark, RareCP improves interval efficiency over recent conformal baselines and foundation model uncertainty estimates while maintaining empirical coverage. Ablations confirm that regime-specific experts, drift-adaptive kernels, sparse retrieval, and teacher anchoring each contribute to the final performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RareCP combines regime-aware expert retrieval with hypernetwork drift adaptation for conformal time series prediction and reports efficiency gains on GIFT-Eval, but the lack of regime-separation diagnostics leaves the core assumption lightly supported.

read the letter

The paper's main move is to handle two sources of non-stationarity at once: distinct error regimes and smooth drift. It does this with a mixture of cosine-attention experts that learn local calibration representations, a hypernetwork that adapts the kernel on the fly, and top-k retrieval of signed residuals to build a weighted conformal quantile. Teacher anchoring keeps the local representations from drifting too far during training. The result is asymmetric intervals that are tighter than recent baselines on GIFT-Eval while preserving coverage, and the ablations attribute gains to each component.

Referee Report

2 major / 2 minor

Summary. The paper introduces RareCP, a regime-aware retrieval method for adaptive conformal prediction in time series forecasting. It learns local calibration representations via a mixture of cosine-attention experts to capture distinct error regimes, employs a compact hypernetwork to adapt kernel parameters for temporal drift, and retrieves the top-k most similar past calibration examples to compute similarity-weighted conformal quantiles over signed residuals, producing asymmetric prediction intervals. The adaptive kernel is trained with a smooth interval score objective and a parameter-space anchor to a teacher kernel. On the GIFT-Eval benchmark, RareCP is claimed to improve interval efficiency over recent conformal baselines and foundation model uncertainty estimates while maintaining empirical coverage, with ablations indicating that each component (regime experts, drift adaptation, sparse retrieval, teacher anchoring) contributes to performance.

Significance. If the empirical gains are robust and the regime separation proves meaningful rather than incidental, RareCP would offer a concrete advance in handling both smooth drift and heterogeneous error regimes within conformal prediction for dependent data, potentially yielding more efficient intervals than purely adaptive miscoverage or unconstrained weighting approaches without sacrificing validity guarantees.

major comments (2)

[§4 (Experiments)] §4 (Experiments) and associated ablations: the central efficiency claim on GIFT-Eval rests on benchmark improvements, yet the manuscript provides no error bars, exact train/calibration/test splits, or statistical significance tests comparing RareCP to baselines. This gap prevents verification that the reported interval-length reductions are reliable rather than artifacts of benchmark variability.
[Ablation studies (Experiments section)] Ablation studies (Experiments section): the paper states that ablations confirm the contribution of regime-specific experts, but supplies no quantitative diagnostics of regime separation quality such as expert assignment entropy, inter-regime residual divergence, or cluster stability metrics. Without such measures, it remains unclear whether the cosine-attention mixture isolates stable error regimes or merely fits spurious correlations, which directly bears on whether the weighted-quantile efficiency gains follow from the regime-aware design.

minor comments (2)

[Method section] The description of the hypernetwork and teacher-kernel anchor in the method section would benefit from an explicit equation or pseudocode showing how the parameter-space regularization is applied during training.
[Method section] Notation for the similarity weights and weighted quantile could be clarified with a single consolidated equation rather than scattered references across paragraphs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. The comments highlight important aspects of empirical rigor and interpretability that we will address to strengthen the manuscript. We respond to each major comment below and outline the corresponding revisions.

read point-by-point responses

Referee: [§4 (Experiments)] §4 (Experiments) and associated ablations: the central efficiency claim on GIFT-Eval rests on benchmark improvements, yet the manuscript provides no error bars, exact train/calibration/test splits, or statistical significance tests comparing RareCP to baselines. This gap prevents verification that the reported interval-length reductions are reliable rather than artifacts of benchmark variability.

Authors: We agree that the lack of error bars, explicit splits, and significance testing reduces the ability to verify the reliability of the reported efficiency gains. In the revised version, we will explicitly document the train/calibration/test splits for every dataset in the GIFT-Eval benchmark. We will also recompute and report interval efficiency as mean ± standard deviation over multiple random seeds for calibration-set construction. Finally, we will add paired statistical tests (Wilcoxon signed-rank) with p-values to compare RareCP against each baseline. These updates will appear in Section 4 and the associated tables. revision: yes
Referee: [Ablation studies (Experiments section)] Ablation studies (Experiments section): the paper states that ablations confirm the contribution of regime-specific experts, but supplies no quantitative diagnostics of regime separation quality such as expert assignment entropy, inter-regime residual divergence, or cluster stability metrics. Without such measures, it remains unclear whether the cosine-attention mixture isolates stable error regimes or merely fits spurious correlations, which directly bears on whether the weighted-quantile efficiency gains follow from the regime-aware design.

Authors: We acknowledge that the current ablation results, while showing performance degradation when experts are removed, do not include direct diagnostics of regime quality. In the revision we will augment the ablation subsection with the requested metrics: average entropy of expert assignment weights across calibration examples (to quantify specialization), Wasserstein distance between signed-residual distributions of different experts (to measure inter-regime divergence), and a simple stability check by re-running assignments on held-out calibration windows. These additions will provide quantitative support that the observed efficiency gains arise from meaningful regime separation rather than spurious fitting. revision: yes

Circularity Check

0 steps flagged

Low circularity: empirical benchmark gains rest on external evaluation rather than self-referential derivations

full rationale

The paper's central claim is an empirical improvement in interval efficiency on the external GIFT-Eval benchmark while preserving coverage. The method combines a mixture of cosine-attention experts, a hypernetwork for drift adaptation, top-k retrieval, and a smooth interval score objective with teacher anchoring. These are design and training choices whose outputs are evaluated against independent baselines and foundation model estimates; no equation or component reduces the reported performance metric to a fitted input by construction. Ablations are mentioned but serve as supporting evidence rather than a closed loop. Minor self-citation risk exists in related conformal literature but is not load-bearing for the benchmark result, yielding only a low score of 2.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that error regimes are separable and that weighted residuals from retrieved examples preserve conformal validity; specific free parameters such as number of experts and retrieval k are implied but not quantified in the abstract.

free parameters (2)

number of experts
Mixture of experts design requires choosing how many distinct regimes to model; this choice affects local representations and is likely tuned.
retrieval k
Top-k selection for calibration examples is a hyperparameter that controls sparsity and relevance of the weighted quantile.

axioms (1)

domain assumption Weighted conformal prediction yields valid coverage when weights are derived from similarity to the test context
The method extends standard conformal guarantees to the weighted, regime-retrieved case without proving the extension from first principles.

pith-pipeline@v0.9.0 · 5519 in / 1367 out tokens · 54758 ms · 2026-05-12T01:24:46.897352+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 5 internal anchors

[1]

Electricity price forecasting: A review of the state of the art and outlook

Rafal Weron. Electricity price forecasting: A review of the state of the art and outlook. International Journal of Forecasting, 2014

work page 2014
[2]

Vladimir V ovk, Alex Gammerman, and Glenn Shafer.Algorithmic Learning in a Random World. 01 2005. doi: 10.1007/b106715

work page doi:10.1007/b106715 2005
[3]

Adaptive conformal predictions for time series, 2022

Margaux Zaffran, Aymeric Dieuleveut, Olivier Féron, Yannig Goude, and Julie Josse. Adaptive conformal predictions for time series, 2022. URLhttps://arxiv.org/abs/2202.07282

work page arXiv 2022
[4]

Conformal prediction interval for dynamic time-series

Chen Xu and Yao Xie. Conformal prediction interval for dynamic time-series. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 11559–11569. PMLR, 18–24 Jul 2021. URLhttps://proceedings.mlr.press/v139/xu21h.html

work page 2021
[5]

A tutorial on conformal prediction, 2007

Glenn Shafer and Vladimir V ovk. A tutorial on conformal prediction, 2007. URL https: //arxiv.org/abs/0706.3188

work page arXiv 2007
[6]

Tibshirani, and Larry Wasserman

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression, 2017. URL https://arxiv.org/abs/ 1604.04173

work page arXiv 2017
[7]

Cross-conformal predictors, 2012

Vladimir V ovk. Cross-conformal predictors, 2012. URL https://arxiv.org/abs/1208. 0806

work page 2012
[8]

Candes, Aaditya Ramdas, and Ryan J

Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. Predictive inference with the jackknife+, 2020. URLhttps://arxiv.org/abs/1905.02928

work page arXiv 2020
[9]

Stable conformal prediction sets

Eugene Ndiaye. Stable conformal prediction sets. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th Inter- national Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 16462–16479. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr. press/v...

work page 2022
[10]

Efficient conformal prediction via cascaded inference with expanded admission, 2021

Adam Fisch, Tal Schuster, Tommi Jaakkola, and Regina Barzilay. Efficient conformal prediction via cascaded inference with expanded admission, 2021. URL https://arxiv.org/abs/ 2007.03114

work page arXiv 2021
[11]

Selection and aggregation of conformal prediction sets, 2024

Yachong Yang and Arun Kumar Kuchibhotla. Selection and aggregation of conformal prediction sets, 2024. URLhttps://arxiv.org/abs/2104.13871

work page arXiv 2024
[12]

Classification with valid and adaptive coverage

Yaniv Romano, Matteo Sesia, and Emmanuel Candes. Classification with valid and adaptive coverage. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 3581–3591. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/ 244edd7e85...

work page 2020
[13]

Improved online conformal prediction via strongly adaptive online learning, 2023

Aadyot Bhatnagar, Huan Wang, Caiming Xiong, and Yu Bai. Improved online conformal prediction via strongly adaptive online learning, 2023. URL https://arxiv.org/abs/2302. 07869

work page 2023
[14]

Conformalized quantile regression

Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, ed- itors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ 5103c3584b063c431b...

work page 2019
[15]

Conformal prediction using decision trees

Ulf Johansson, Henrik Boström, and Tuve Löfström. Conformal prediction using decision trees. 12 2013. doi: 10.1109/ICDM.2013.85

work page doi:10.1109/icdm.2013.85 2013
[16]

Johansson, Henrik Boström, Tuwe Löfström, and Henrik Linusson

U. Johansson, Henrik Boström, Tuwe Löfström, and Henrik Linusson. Regression conformal prediction with random forests.Machine Learning, 97:155 – 176, 2014. URL https://api. semanticscholar.org/CorpusID:14015369. 11

work page 2014
[17]

Learning optimal conformal classifiers, 2022

David Stutz, Krishnamurthy, Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. Learning optimal conformal classifiers, 2022. URLhttps://arxiv.org/abs/2110.09192

work page arXiv 2022
[18]

Efficient and differentiable conformal prediction with general function classes, 2022

Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, and Caiming Xiong. Efficient and differentiable conformal prediction with general function classes, 2022. URL https://arxiv.org/abs/ 2202.11091

work page arXiv 2022
[19]

Ran Xie, Rina Foygel Barber, and Emmanuel J. Candès. Boosted conformal prediction intervals,

work page
[20]

URLhttps://arxiv.org/abs/2406.07449

work page arXiv
[21]

Candes, Aaditya Ramdas, and Ryan J

Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. Conformal prediction beyond exchangeability, 2023. URLhttps://arxiv.org/abs/2202.13415

work page arXiv 2023
[22]

Adaptive conformal inference under distribution shift

Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 1660–1672. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/ 2021/file/0d441d...

work page 2021
[23]

Conformal inference for online prediction with arbitrary distribution shifts, 2023

Isaac Gibbs and Emmanuel Candès. Conformal inference for online prediction with arbitrary distribution shifts, 2023. URLhttps://arxiv.org/abs/2208.08401

work page arXiv 2023
[24]

Conformal pid control for time series prediction

Anastasios Angelopoulos, Emmanuel Candes, and Ryan J Tibshirani. Conformal pid control for time series prediction. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 23047–23074. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/ pa...

work page 2023
[25]

Tibshirani, Rina Foygel Barber, Emmanuel J

Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candes, and Aaditya Ramdas. Conformal prediction under covariate shift, 2020. URLhttps://arxiv.org/abs/1904.06019

work page arXiv 2020
[26]

Distribution-free uncertainty quantification for classification under label shift, 2021

Aleksandr Podkopaev and Aaditya Ramdas. Distribution-free uncertainty quantification for classification under label shift, 2021. URLhttps://arxiv.org/abs/2103.03323

work page arXiv 2021
[27]

Kernel-based optimally weighted conformal time-series prediction, 2026

Jonghyeok Lee, Chen Xu, and Yao Xie. Kernel-based optimally weighted conformal time-series prediction, 2026. URLhttps://arxiv.org/abs/2405.16828

work page arXiv 2026
[28]

Conformal prediction for time series with modern hopfield networks, 2023

Andreas Auer, Martin Gauch, Daniel Klotz, and Sepp Hochreiter. Conformal prediction for time series with modern hopfield networks, 2023. URLhttps://arxiv.org/abs/2303.12783

work page arXiv 2023
[29]

Predictive inference with feature conformal prediction, 2023

Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, and Yang Yuan. Predictive inference with feature conformal prediction, 2023. URL https://arxiv.org/abs/2210. 00173

work page 2023
[30]

Conformalized time series with semantic features

Baiting Chen, Zhimei Ren, and Lu Cheng. Conformalized time series with semantic features. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 121449–121474. Curran Associates, Inc., 2024. doi: 10.52202/079017-3859

work page doi:10.52202/079017-3859 2024
[31]

Adaptive conformal prediction via mixture-of-experts gating similarity

Jingsen Kong, Wenlu Tang, Dezheng Kong, Linglong Kong, Guangren Yang, and Bei Jiang. Adaptive conformal prediction via mixture-of-experts gating similarity. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=vCmnu4q8C3

work page 2026
[32]

Sequential predictive conformal inference for time series, 2023

Chen Xu and Yao Xie. Sequential predictive conformal inference for time series, 2023. URL https://arxiv.org/abs/2212.03463

work page arXiv 2023
[33]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021. URL https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2021
[34]

Retrieval augmented time series forecasting, 2025

Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, and Jinsung Yoon. Retrieval augmented time series forecasting, 2025. URLhttps://arxiv.org/abs/2505.04163. 12

work page arXiv 2025
[35]

Ts-rag: Retrieval-augmented generation based time series foundation models are stronger zero-shot forecaster, 2025

Kanghui Ning, Zijie Pan, Yu Liu, Yushan Jiang, James Yiming Zhang, Kashif Rasul, Anderson Schneider, Lintao Ma, Yuriy Nevmyvaka, and Dongjin Song. Ts-rag: Retrieval-augmented generation based time series foundation models are stronger zero-shot forecaster, 2025. URL https://arxiv.org/abs/2503.07649

work page arXiv 2025
[36]

Retrieval based time series forecasting, 2022

Baoyu Jing, Si Zhang, Yada Zhu, Bin Peng, Kaiyu Guan, Andrew Margenot, and Hanghang Tong. Retrieval based time series forecasting, 2022. URL https://arxiv.org/abs/2209.13525

work page arXiv 2022
[37]

Retrieval-augmented diffusion models for time series forecasting, 2024

Jingwei Liu, Ling Yang, Hongyan Li, and Shenda Hong. Retrieval-augmented diffusion models for time series forecasting, 2024. URLhttps://arxiv.org/abs/2410.18712

work page arXiv 2024
[38]

Timerag: Boosting llm time series forecasting via retrieval-augmented generation, 2024

Silin Yang, Dong Wang, Haoqi Zheng, and Ruochun Jin. Timerag: Boosting llm time series forecasting via retrieval-augmented generation, 2024. URL https://arxiv.org/abs/2412. 16643

work page 2024
[39]

HyperNetworks

David Ha, Andrew M. Dai, and Quoc V . Le. Hypernetworks.CoRR, abs/1609.09106, 2016. URLhttp://arxiv.org/abs/1609.09106

work page internal anchor Pith review arXiv 2016
[40]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catas- trophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):352...

work page doi:10.1073/pnas.1611835114 2017
[41]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc' Aurelio Ranzato. Gradient episodic memory for continual learning. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ f87...

work page 2017
[42]

Grewe, and João Sacramento

Johannes von Oswald, Christian Henning, Benjamin F. Grewe, and João Sacramento. Continual learning with hypernetworks, 2020. URLhttps://arxiv.org/abs/1906.00695

work page arXiv 2020
[43]

Personalized federated learning using hypernetworks, 2021

Aviv Shamsian, Aviv Navon, Ethan Fetaya, and Gal Chechik. Personalized federated learning using hypernetworks, 2021. URLhttps://arxiv.org/abs/2103.04628

work page arXiv 2021
[44]

Distilling the knowledge in a neural network,

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network,

work page
[45]

URLhttps://arxiv.org/abs/1503.02531

work page internal anchor Pith review Pith/arXiv arXiv
[46]

FitNets: Hints for Thin Deep Nets

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets, 2015. URL https://arxiv.org/abs/ 1412.6550

work page internal anchor Pith review arXiv 2015
[47]

Gift-eval: A benchmark for general time series forecasting model evaluation,

Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift-eval: A benchmark for general time series forecasting model evaluation,

work page
[48]

URLhttps://arxiv.org/abs/2410.10393

work page arXiv
[49]

Olivares

Azul Garca, Max Mergenthaler Canseco, Challú Cristian, and Kin G. Olivares. StatsForecast: Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022. URLhttps://github.com/Nixtla/statsforecast

work page 2022
[50]

Bronstein, and Filippo Maria Bianchi

Roberto Neglia, Andrea Cini, Michael M. Bronstein, and Filippo Maria Bianchi. ResCP: Reser- voir Conformal Prediction for Time Series Forecasting.arXiv e-prints, art. arXiv:2510.05060, October 2025. doi: 10.48550/arXiv.2510.05060

work page doi:10.48550/arxiv.2510.05060 2025
[51]

Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning, 2025

Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, and Sepp Hochreiter. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning, 2025. URLhttps://arxiv.org/abs/2505.23719

work page arXiv 2025
[52]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language ...

work page internal anchor Pith review arXiv 2024
[53]

Unified training of universal time series forecasting transformers.arXiv preprint arXiv:2402.02592, 2024

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers, 2024. URL https: //arxiv.org/abs/2402.02592

work page arXiv 2024
[54]

A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting, 2024. URLhttps://arxiv.org/abs/2310.10688

work page arXiv 2024
[55]

B., M \"u ller, S., Salinas, D., and Hutter, F

Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. From Tables to Time: Extending TabPFN-v2 to Time Series Forecasting.arXiv e-prints, art. arXiv:2501.02945, January 2025. doi: 10.48550/arXiv.2501.02945

work page doi:10.48550/arxiv.2501.02945 2025
[56]

Toto: Time series optimized transformer for observability, 2024

Ben Cohen, Emaad Khwaja, Kan Wang, Charles Masson, Elise Ramé, Youssef Doubli, and Othmane Abou-Amal. Toto: Time series optimized transformer for observability, 2024. URL https://arxiv.org/abs/2407.07874

work page arXiv 2024
[57]

Lag-llama: Towards foundation models for probabilistic time series forecasting, 2024

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models ...

work page arXiv 2024