pith. machine review for the scientific record. sign in

arxiv: 2605.08857 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction

Manuel Heurich, Maximilian Granz, Tim Landgraf

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:24 UTC · model grok-4.3

classification 💻 cs.LG
keywords conformal predictiontime series forecastingregime detectionretrieval methodsadaptive calibrationprediction intervalsuncertainty quantificationmixture of experts
0
0 comments X

The pith

RareCP retrieves top-k past residuals weighted by regime-specific attention experts to form tighter conformal prediction intervals for drifting time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to make conformal prediction intervals narrower for time series forecasts that face changing error patterns and drift, without losing their coverage guarantees. Current approaches either adjust rates over time or learn loose weights, but they do not explicitly separate smoothly drifting errors from distinct co-existing error regimes. RareCP trains a mixture of cosine-attention experts to identify those regimes locally, uses a hypernetwork to adapt kernel parameters for drift, and then pulls the most similar past calibration cases to compute a weighted quantile of signed residuals. The result is asymmetric intervals that adapt to the current context. A sympathetic reader would care because narrower reliable intervals directly improve decision-making in forecasting tasks where over-wide bands waste resources or hide risk.

Core claim

RareCP learns local calibration representations through a mixture of cosine-attention experts that each capture distinct error regimes, while a compact hypernetwork adapts the kernel parameters to track temporal drift. Given a new forecasting context, it retrieves the top-k most relevant calibration examples, assigns similarity weights, and forms a weighted conformal quantile over their signed residuals, yielding asymmetric prediction intervals. The adaptive kernel is trained using a smooth interval score objective with a parameter-space anchor to a lightweight teacher kernel.

What carries the argument

Mixture of cosine-attention experts that separate error regimes, paired with a hypernetwork for drift adaptation and top-k retrieval to weight residuals for the conformal quantile.

If this is right

  • Interval efficiency improves over recent conformal baselines and foundation-model uncertainty estimates on the GIFT-Eval benchmark.
  • Empirical coverage is maintained at the nominal level.
  • Ablations show separate gains from regime-specific experts, drift-adaptive kernels, sparse retrieval, and teacher anchoring.
  • The method produces asymmetric intervals that adapt to local context rather than global or sliding-window statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval-plus-experts pattern could be tested on other online learning settings where error regimes shift abruptly, such as anomaly detection streams.
  • If regime separation proves stable, the approach might reduce the frequency of full recalibration needed in production forecasting pipelines.
  • One could check whether the learned expert weights themselves serve as interpretable indicators of which error regime is active at any moment.
  • Applying the method to multivariate series or to settings with known external regime triggers would test how far the cosine-attention separation generalizes.

Load-bearing premise

Distinct error regimes exist in the data and the mixture of cosine-attention experts can reliably separate them so that retrieved residuals remain relevant for the weighted quantile even when drift occurs.

What would settle it

On a time series dataset with documented regime shifts, remove the expert mixture and measure whether interval width improves by less than the reported margin over baselines while empirical coverage stays at the target level.

Figures

Figures reproduced from arXiv: 2605.08857 by Manuel Heurich, Maximilian Granz, Tim Landgraf.

Figure 1
Figure 1. Figure 1: RareCP adapts its 80%-interval prediction locally to the residual regime on m4_daily. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: RareCP overview. A forecasting backbone provides a point prediction and history context. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Stepwise RareCP ablation on Bench10. Bars show additional percentage improvement in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of 80% intervals using the gating mixture on the Bench10 datasets. [PITH_FULL_IMAGE:figures/full_fig_p035_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of SplitCP Uniform predictions on the first 300 test samples per dataset; [PITH_FULL_IMAGE:figures/full_fig_p036_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of SplitCP Uniform predictions on the first 300 test samples per dataset; [PITH_FULL_IMAGE:figures/full_fig_p037_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of NexCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p038_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of NexCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p039_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of ACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p040_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of ACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p041_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visualization of dtACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p042_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of dtACI predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p043_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visualization of ResCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p044_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visualization of ResCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p045_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization of KOWCPI predictions on the first 300 test samples per dataset; Bench10 [PITH_FULL_IMAGE:figures/full_fig_p046_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Visualization of KOWCPI predictions on the first 300 test samples per dataset; Bench10 [PITH_FULL_IMAGE:figures/full_fig_p047_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Visualization of HopCPT predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p048_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Visualization of HopCPT predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p049_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Visualization of RareCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p050_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Visualization of RareCP predictions on the first 300 test samples per dataset; Bench10 on [PITH_FULL_IMAGE:figures/full_fig_p051_20.png] view at source ↗
read the original abstract

Recent advances in uncertainty quantification for time series forecasting show that conformal prediction can provide reliable prediction intervals, yet standard conformal methods are often inefficient under temporal dependence, drift, and heterogeneous error behavior. Existing methods typically either update miscoverage rates over time or learn unconstrained calibration weights, without explicitly separating two central sources of nonstationarity: smoothly drifting error distributions and co-existing distinct error regimes. We introduce RareCP, a regime-aware retrieval method for adaptive conformal time series prediction. RareCP learns local calibration representations through a mixture of cosine-attention experts that each capture distinct error regimes, while a compact hypernetwork adapts the kernel parameters to track temporal drift. Given a new forecasting context, RareCP retrieves the top-k most relevant calibration examples, assigns similarity weights, and forms a weighted conformal quantile over their signed residuals, yielding asymmetric prediction intervals. The adaptive kernel is trained using a smooth interval score objective, with a parameter-space anchor to a lightweight teacher kernel to preserve stable local representations. On the GIFT-Eval benchmark, RareCP improves interval efficiency over recent conformal baselines and foundation model uncertainty estimates while maintaining empirical coverage. Ablations confirm that regime-specific experts, drift-adaptive kernels, sparse retrieval, and teacher anchoring each contribute to the final performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces RareCP, a regime-aware retrieval method for adaptive conformal prediction in time series forecasting. It learns local calibration representations via a mixture of cosine-attention experts to capture distinct error regimes, employs a compact hypernetwork to adapt kernel parameters for temporal drift, and retrieves the top-k most similar past calibration examples to compute similarity-weighted conformal quantiles over signed residuals, producing asymmetric prediction intervals. The adaptive kernel is trained with a smooth interval score objective and a parameter-space anchor to a teacher kernel. On the GIFT-Eval benchmark, RareCP is claimed to improve interval efficiency over recent conformal baselines and foundation model uncertainty estimates while maintaining empirical coverage, with ablations indicating that each component (regime experts, drift adaptation, sparse retrieval, teacher anchoring) contributes to performance.

Significance. If the empirical gains are robust and the regime separation proves meaningful rather than incidental, RareCP would offer a concrete advance in handling both smooth drift and heterogeneous error regimes within conformal prediction for dependent data, potentially yielding more efficient intervals than purely adaptive miscoverage or unconstrained weighting approaches without sacrificing validity guarantees.

major comments (2)
  1. [§4 (Experiments)] §4 (Experiments) and associated ablations: the central efficiency claim on GIFT-Eval rests on benchmark improvements, yet the manuscript provides no error bars, exact train/calibration/test splits, or statistical significance tests comparing RareCP to baselines. This gap prevents verification that the reported interval-length reductions are reliable rather than artifacts of benchmark variability.
  2. [Ablation studies (Experiments section)] Ablation studies (Experiments section): the paper states that ablations confirm the contribution of regime-specific experts, but supplies no quantitative diagnostics of regime separation quality such as expert assignment entropy, inter-regime residual divergence, or cluster stability metrics. Without such measures, it remains unclear whether the cosine-attention mixture isolates stable error regimes or merely fits spurious correlations, which directly bears on whether the weighted-quantile efficiency gains follow from the regime-aware design.
minor comments (2)
  1. [Method section] The description of the hypernetwork and teacher-kernel anchor in the method section would benefit from an explicit equation or pseudocode showing how the parameter-space regularization is applied during training.
  2. [Method section] Notation for the similarity weights and weighted quantile could be clarified with a single consolidated equation rather than scattered references across paragraphs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. The comments highlight important aspects of empirical rigor and interpretability that we will address to strengthen the manuscript. We respond to each major comment below and outline the corresponding revisions.

read point-by-point responses
  1. Referee: [§4 (Experiments)] §4 (Experiments) and associated ablations: the central efficiency claim on GIFT-Eval rests on benchmark improvements, yet the manuscript provides no error bars, exact train/calibration/test splits, or statistical significance tests comparing RareCP to baselines. This gap prevents verification that the reported interval-length reductions are reliable rather than artifacts of benchmark variability.

    Authors: We agree that the lack of error bars, explicit splits, and significance testing reduces the ability to verify the reliability of the reported efficiency gains. In the revised version, we will explicitly document the train/calibration/test splits for every dataset in the GIFT-Eval benchmark. We will also recompute and report interval efficiency as mean ± standard deviation over multiple random seeds for calibration-set construction. Finally, we will add paired statistical tests (Wilcoxon signed-rank) with p-values to compare RareCP against each baseline. These updates will appear in Section 4 and the associated tables. revision: yes

  2. Referee: [Ablation studies (Experiments section)] Ablation studies (Experiments section): the paper states that ablations confirm the contribution of regime-specific experts, but supplies no quantitative diagnostics of regime separation quality such as expert assignment entropy, inter-regime residual divergence, or cluster stability metrics. Without such measures, it remains unclear whether the cosine-attention mixture isolates stable error regimes or merely fits spurious correlations, which directly bears on whether the weighted-quantile efficiency gains follow from the regime-aware design.

    Authors: We acknowledge that the current ablation results, while showing performance degradation when experts are removed, do not include direct diagnostics of regime quality. In the revision we will augment the ablation subsection with the requested metrics: average entropy of expert assignment weights across calibration examples (to quantify specialization), Wasserstein distance between signed-residual distributions of different experts (to measure inter-regime divergence), and a simple stability check by re-running assignments on held-out calibration windows. These additions will provide quantitative support that the observed efficiency gains arise from meaningful regime separation rather than spurious fitting. revision: yes

Circularity Check

0 steps flagged

Low circularity: empirical benchmark gains rest on external evaluation rather than self-referential derivations

full rationale

The paper's central claim is an empirical improvement in interval efficiency on the external GIFT-Eval benchmark while preserving coverage. The method combines a mixture of cosine-attention experts, a hypernetwork for drift adaptation, top-k retrieval, and a smooth interval score objective with teacher anchoring. These are design and training choices whose outputs are evaluated against independent baselines and foundation model estimates; no equation or component reduces the reported performance metric to a fitted input by construction. Ablations are mentioned but serve as supporting evidence rather than a closed loop. Minor self-citation risk exists in related conformal literature but is not load-bearing for the benchmark result, yielding only a low score of 2.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that error regimes are separable and that weighted residuals from retrieved examples preserve conformal validity; specific free parameters such as number of experts and retrieval k are implied but not quantified in the abstract.

free parameters (2)
  • number of experts
    Mixture of experts design requires choosing how many distinct regimes to model; this choice affects local representations and is likely tuned.
  • retrieval k
    Top-k selection for calibration examples is a hyperparameter that controls sparsity and relevance of the weighted quantile.
axioms (1)
  • domain assumption Weighted conformal prediction yields valid coverage when weights are derived from similarity to the test context
    The method extends standard conformal guarantees to the weighted, regime-retrieved case without proving the extension from first principles.

pith-pipeline@v0.9.0 · 5519 in / 1367 out tokens · 54758 ms · 2026-05-12T01:24:46.897352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 5 internal anchors

  1. [1]

    Electricity price forecasting: A review of the state of the art and outlook

    Rafal Weron. Electricity price forecasting: A review of the state of the art and outlook. International Journal of Forecasting, 2014

  2. [2]

    Vladimir V ovk, Alex Gammerman, and Glenn Shafer.Algorithmic Learning in a Random World. 01 2005. doi: 10.1007/b106715

  3. [3]

    Adaptive conformal predictions for time series, 2022

    Margaux Zaffran, Aymeric Dieuleveut, Olivier Féron, Yannig Goude, and Julie Josse. Adaptive conformal predictions for time series, 2022. URLhttps://arxiv.org/abs/2202.07282

  4. [4]

    Conformal prediction interval for dynamic time-series

    Chen Xu and Yao Xie. Conformal prediction interval for dynamic time-series. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 11559–11569. PMLR, 18–24 Jul 2021. URLhttps://proceedings.mlr.press/v139/xu21h.html

  5. [5]

    A tutorial on conformal prediction, 2007

    Glenn Shafer and Vladimir V ovk. A tutorial on conformal prediction, 2007. URL https: //arxiv.org/abs/0706.3188

  6. [6]

    Tibshirani, and Larry Wasserman

    Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression, 2017. URL https://arxiv.org/abs/ 1604.04173

  7. [7]

    Cross-conformal predictors, 2012

    Vladimir V ovk. Cross-conformal predictors, 2012. URL https://arxiv.org/abs/1208. 0806

  8. [8]

    Candes, Aaditya Ramdas, and Ryan J

    Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. Predictive inference with the jackknife+, 2020. URLhttps://arxiv.org/abs/1905.02928

  9. [9]

    Stable conformal prediction sets

    Eugene Ndiaye. Stable conformal prediction sets. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th Inter- national Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 16462–16479. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr. press/v...

  10. [10]

    Efficient conformal prediction via cascaded inference with expanded admission, 2021

    Adam Fisch, Tal Schuster, Tommi Jaakkola, and Regina Barzilay. Efficient conformal prediction via cascaded inference with expanded admission, 2021. URL https://arxiv.org/abs/ 2007.03114

  11. [11]

    Selection and aggregation of conformal prediction sets, 2024

    Yachong Yang and Arun Kumar Kuchibhotla. Selection and aggregation of conformal prediction sets, 2024. URLhttps://arxiv.org/abs/2104.13871

  12. [12]

    Classification with valid and adaptive coverage

    Yaniv Romano, Matteo Sesia, and Emmanuel Candes. Classification with valid and adaptive coverage. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 3581–3591. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/ 244edd7e85...

  13. [13]

    Improved online conformal prediction via strongly adaptive online learning, 2023

    Aadyot Bhatnagar, Huan Wang, Caiming Xiong, and Yu Bai. Improved online conformal prediction via strongly adaptive online learning, 2023. URL https://arxiv.org/abs/2302. 07869

  14. [14]

    Conformalized quantile regression

    Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, ed- itors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ 5103c3584b063c431b...

  15. [15]

    Conformal prediction using decision trees

    Ulf Johansson, Henrik Boström, and Tuve Löfström. Conformal prediction using decision trees. 12 2013. doi: 10.1109/ICDM.2013.85

  16. [16]

    Johansson, Henrik Boström, Tuwe Löfström, and Henrik Linusson

    U. Johansson, Henrik Boström, Tuwe Löfström, and Henrik Linusson. Regression conformal prediction with random forests.Machine Learning, 97:155 – 176, 2014. URL https://api. semanticscholar.org/CorpusID:14015369. 11

  17. [17]

    Learning optimal conformal classifiers, 2022

    David Stutz, Krishnamurthy, Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. Learning optimal conformal classifiers, 2022. URLhttps://arxiv.org/abs/2110.09192

  18. [18]

    Efficient and differentiable conformal prediction with general function classes, 2022

    Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, and Caiming Xiong. Efficient and differentiable conformal prediction with general function classes, 2022. URL https://arxiv.org/abs/ 2202.11091

  19. [19]

    Ran Xie, Rina Foygel Barber, and Emmanuel J. Candès. Boosted conformal prediction intervals,

  20. [20]

    URLhttps://arxiv.org/abs/2406.07449

  21. [21]

    Candes, Aaditya Ramdas, and Ryan J

    Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. Conformal prediction beyond exchangeability, 2023. URLhttps://arxiv.org/abs/2202.13415

  22. [22]

    Adaptive conformal inference under distribution shift

    Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 1660–1672. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/ 2021/file/0d441d...

  23. [23]

    Conformal inference for online prediction with arbitrary distribution shifts, 2023

    Isaac Gibbs and Emmanuel Candès. Conformal inference for online prediction with arbitrary distribution shifts, 2023. URLhttps://arxiv.org/abs/2208.08401

  24. [24]

    Conformal pid control for time series prediction

    Anastasios Angelopoulos, Emmanuel Candes, and Ryan J Tibshirani. Conformal pid control for time series prediction. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 23047–23074. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/ pa...

  25. [25]

    Tibshirani, Rina Foygel Barber, Emmanuel J

    Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candes, and Aaditya Ramdas. Conformal prediction under covariate shift, 2020. URLhttps://arxiv.org/abs/1904.06019

  26. [26]

    Distribution-free uncertainty quantification for classification under label shift, 2021

    Aleksandr Podkopaev and Aaditya Ramdas. Distribution-free uncertainty quantification for classification under label shift, 2021. URLhttps://arxiv.org/abs/2103.03323

  27. [27]

    Kernel-based optimally weighted conformal time-series prediction, 2026

    Jonghyeok Lee, Chen Xu, and Yao Xie. Kernel-based optimally weighted conformal time-series prediction, 2026. URLhttps://arxiv.org/abs/2405.16828

  28. [28]

    Conformal prediction for time series with modern hopfield networks, 2023

    Andreas Auer, Martin Gauch, Daniel Klotz, and Sepp Hochreiter. Conformal prediction for time series with modern hopfield networks, 2023. URLhttps://arxiv.org/abs/2303.12783

  29. [29]

    Predictive inference with feature conformal prediction, 2023

    Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, and Yang Yuan. Predictive inference with feature conformal prediction, 2023. URL https://arxiv.org/abs/2210. 00173

  30. [30]

    Conformalized time series with semantic features

    Baiting Chen, Zhimei Ren, and Lu Cheng. Conformalized time series with semantic features. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 121449–121474. Curran Associates, Inc., 2024. doi: 10.52202/079017-3859

  31. [31]

    Adaptive conformal prediction via mixture-of-experts gating similarity

    Jingsen Kong, Wenlu Tang, Dezheng Kong, Linglong Kong, Guangren Yang, and Bei Jiang. Adaptive conformal prediction via mixture-of-experts gating similarity. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=vCmnu4q8C3

  32. [32]

    Sequential predictive conformal inference for time series, 2023

    Chen Xu and Yao Xie. Sequential predictive conformal inference for time series, 2023. URL https://arxiv.org/abs/2212.03463

  33. [33]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021. URL https://arxiv.org/abs/2005.11401

  34. [34]

    Retrieval augmented time series forecasting, 2025

    Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, and Jinsung Yoon. Retrieval augmented time series forecasting, 2025. URLhttps://arxiv.org/abs/2505.04163. 12

  35. [35]

    Ts-rag: Retrieval-augmented generation based time series foundation models are stronger zero-shot forecaster, 2025

    Kanghui Ning, Zijie Pan, Yu Liu, Yushan Jiang, James Yiming Zhang, Kashif Rasul, Anderson Schneider, Lintao Ma, Yuriy Nevmyvaka, and Dongjin Song. Ts-rag: Retrieval-augmented generation based time series foundation models are stronger zero-shot forecaster, 2025. URL https://arxiv.org/abs/2503.07649

  36. [36]

    Retrieval based time series forecasting, 2022

    Baoyu Jing, Si Zhang, Yada Zhu, Bin Peng, Kaiyu Guan, Andrew Margenot, and Hanghang Tong. Retrieval based time series forecasting, 2022. URL https://arxiv.org/abs/2209.13525

  37. [37]

    Retrieval-augmented diffusion models for time series forecasting, 2024

    Jingwei Liu, Ling Yang, Hongyan Li, and Shenda Hong. Retrieval-augmented diffusion models for time series forecasting, 2024. URLhttps://arxiv.org/abs/2410.18712

  38. [38]

    Timerag: Boosting llm time series forecasting via retrieval-augmented generation, 2024

    Silin Yang, Dong Wang, Haoqi Zheng, and Ruochun Jin. Timerag: Boosting llm time series forecasting via retrieval-augmented generation, 2024. URL https://arxiv.org/abs/2412. 16643

  39. [39]

    HyperNetworks

    David Ha, Andrew M. Dai, and Quoc V . Le. Hypernetworks.CoRR, abs/1609.09106, 2016. URLhttp://arxiv.org/abs/1609.09106

  40. [40]

    Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catas- trophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):352...

  41. [41]

    Gradient episodic memory for continual learning

    David Lopez-Paz and Marc' Aurelio Ranzato. Gradient episodic memory for continual learning. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ f87...

  42. [42]

    Grewe, and João Sacramento

    Johannes von Oswald, Christian Henning, Benjamin F. Grewe, and João Sacramento. Continual learning with hypernetworks, 2020. URLhttps://arxiv.org/abs/1906.00695

  43. [43]

    Personalized federated learning using hypernetworks, 2021

    Aviv Shamsian, Aviv Navon, Ethan Fetaya, and Gal Chechik. Personalized federated learning using hypernetworks, 2021. URLhttps://arxiv.org/abs/2103.04628

  44. [44]

    Distilling the knowledge in a neural network,

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network,

  45. [45]

    URLhttps://arxiv.org/abs/1503.02531

  46. [46]

    FitNets: Hints for Thin Deep Nets

    Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets, 2015. URL https://arxiv.org/abs/ 1412.6550

  47. [47]

    Gift-eval: A benchmark for general time series forecasting model evaluation,

    Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift-eval: A benchmark for general time series forecasting model evaluation,

  48. [48]

    URLhttps://arxiv.org/abs/2410.10393

  49. [49]

    Olivares

    Azul Garca, Max Mergenthaler Canseco, Challú Cristian, and Kin G. Olivares. StatsForecast: Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022. URLhttps://github.com/Nixtla/statsforecast

  50. [50]

    Bronstein, and Filippo Maria Bianchi

    Roberto Neglia, Andrea Cini, Michael M. Bronstein, and Filippo Maria Bianchi. ResCP: Reser- voir Conformal Prediction for Time Series Forecasting.arXiv e-prints, art. arXiv:2510.05060, October 2025. doi: 10.48550/arXiv.2510.05060

  51. [51]

    Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning, 2025

    Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, and Sepp Hochreiter. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning, 2025. URLhttps://arxiv.org/abs/2505.23719

  52. [52]

    Chronos: Learning the Language of Time Series

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language ...

  53. [53]

    Unified training of universal time series forecasting transformers.arXiv preprint arXiv:2402.02592, 2024

    Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers, 2024. URL https: //arxiv.org/abs/2402.02592

  54. [54]

    A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting, 2024. URLhttps://arxiv.org/abs/2310.10688

  55. [55]

    B., M \"u ller, S., Salinas, D., and Hutter, F

    Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. From Tables to Time: Extending TabPFN-v2 to Time Series Forecasting.arXiv e-prints, art. arXiv:2501.02945, January 2025. doi: 10.48550/arXiv.2501.02945

  56. [56]

    Toto: Time series optimized transformer for observability, 2024

    Ben Cohen, Emaad Khwaja, Kan Wang, Charles Masson, Elise Ramé, Youssef Doubli, and Othmane Abou-Amal. Toto: Time series optimized transformer for observability, 2024. URL https://arxiv.org/abs/2407.07874

  57. [57]

    Lag-llama: Towards foundation models for probabilistic time series forecasting, 2024

    Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models ...