pith. machine review for the scientific record. sign in

arxiv: 2605.09208 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

TSNN: A Non-parametric and Interpretable Framework for Traffic Time Series Forecasting

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:58 UTC · model grok-4.3

classification 💻 cs.LG
keywords traffic time series forecastingnon-parametric modelinterpretable forecastingmemory bankpattern matchingtime series decouplingperiodicity
0
0 comments X

The pith

A non-parametric memory bank matching process forecasts traffic time series competitively with deep learning models while remaining fully interpretable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TSNN, a framework that builds a memory bank from training traffic data and decouples new series through successive layer-wise matches to stored patterns. Without any trainable parameters, each forecast directly traces back to the specific training entries that matched at each step. This design exploits regular daily and weekly cycles in traffic flows to maintain accuracy. Readers would care because it supplies a transparent alternative to neural networks when understanding why a prediction was made matters alongside the numerical result.

Core claim

TSNN decouples an input time series by matching segments against a memory bank constructed from the training set using the identical matching procedure; the resulting additive components yield forecasts without any learned weights, and experiments show this yields competitive accuracy against standard deep learning baselines across four real-world traffic flow datasets while permitting direct visualization of each component's contribution.

What carries the argument

The memory bank of training patterns paired with a multi-layer matching process that successively decouples the series into interpretable additive parts for forecasting.

If this is right

  • Each forecast can be inspected by listing the exact training time steps that contributed to it through the matching layers.
  • No gradient-based training is required; only the memory bank needs to be assembled from the training records.
  • The same matching visualization used in the paper can be applied to any new input to reveal which past patterns drove the output.
  • Accuracy improves when the data exhibit strong periodicity because similar daily or weekly segments are more likely to exist in the memory bank.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same memory-bank decoupling could be tested on other repetitive time series such as hourly electricity demand or retail sales.
  • Because no parameters are tuned, the method may degrade more gracefully than neural nets when test distributions shift modestly from training.
  • An online version that adds new observations to the memory bank after each forecast could support rolling predictions over longer horizons.

Load-bearing premise

The training data contain enough representative patterns that a memory bank built from them can supply accurate matches for unseen test sequences.

What would settle it

Apply the model to a traffic dataset recorded after a major schedule change or disruption not present in the training period and check whether its error exceeds that of a simple historical-average baseline.

Figures

Figures reproduced from arXiv: 2605.09208 by Benjamin Ng, Bowen Liu, Chan-Tong Lam, Haijian Lai, Junhao Dong, Sio-Kei Im, Wei Ke.

Figure 1
Figure 1. Figure 1: The architecture of TSNN. The framework operates in two stages: memory bank construction (top) and prediction (bottom). In the construction stage, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of traffic flow data and its historical average from sensor [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study of the time-wise decoupling and mean decoupling [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hyperparameter studies of TSNN on PEMS04 where the lines with [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of aggregated predictions after each layer on sensor [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of input time series and output residual signals of the first layer by t-SNE. The visualization of the input shows a distinct distribution [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualizations of the contributions to the time step 1339 (with the periodic time step of 72) of sensor 100 in the test set of PEMS08. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualizations of the day-to-day contributions and the day of week contributions to a Thursday, a Saturday, and a Sunday of sensor 100 in the test [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Although many complex models were proposed to analyze time series data, some studies have demonstrated remarkable performance with simpler structures. A recent study proposed a non-parametric framework for 3D point cloud classification, which has the potential to be adapted for time series forecasting and enable interpretability. Inspired by the previous works, we present TSNN, a non-parametric and interpretable framework for traffic time series forecasting. TSNN consists of multiple layers that decouple the time series by matching the entries in a memory bank, where the memory bank is constructed using a similar matching process within the training set. It leverages the periodicity in traffic data to enhance forecasting accuracy while maintaining a simple model architecture. The proposed model operates without trainable parameters, preserving its inherent interpretability. In the experiments, TSNN achieves competitive performance compared to the typical deep learning models in four real-world traffic flow datasets. We also visualize the decoupling process to show the effectiveness of the components. Finally, we demonstrate the interpretability of the model and illustrate the contribution of each time step within the memory bank.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes TSNN, a non-parametric and interpretable framework for traffic time series forecasting. TSNN consists of multiple layers that decouple input time series by matching entries against a memory bank constructed from the training set using an analogous matching process. The approach leverages periodicity in traffic data, requires no trainable parameters, and is reported to achieve competitive performance against typical deep learning models on four real-world traffic flow datasets, accompanied by visualizations of the decoupling process and illustrations of per-time-step contributions for interpretability.

Significance. If the empirical claims are substantiated, TSNN offers a simple, fully non-parametric alternative to deep neural networks for traffic forecasting that prioritizes interpretability via explicit historical pattern matching. This could be valuable in operational settings where transparency and reproducibility matter more than marginal accuracy gains, and the absence of fitted parameters eliminates overfitting concerns common in parametric models.

major comments (2)
  1. [Experiments] Experiments section: the headline claim that TSNN achieves competitive performance on four datasets supplies no quantitative metrics, baseline specifications, error bars, statistical tests, or description of the matching distance function, rendering the central empirical result impossible to evaluate.
  2. [Methodology] Memory bank construction and matching process (described in the methodology): no evidence is presented that nearest-neighbor matches from the training-only memory bank remain close or representative on test sequences. Missing are coverage statistics, nearest-neighbor distance distributions, or ablations on temporal splits that would address distribution shifts (incidents, holidays, sensor drift) in traffic data; this assumption is load-bearing for the non-parametric generalization claim.
minor comments (2)
  1. [Abstract] Abstract: the reference to the prior non-parametric 3D point-cloud work is not accompanied by a citation.
  2. [Methodology] The number of layers and the precise definition of the matching criterion are stated only at a high level; a concrete algorithmic description or pseudocode would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help strengthen the empirical validation and robustness analysis of TSNN. We address each major comment below and have revised the manuscript to provide the requested details and additional experiments.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the headline claim that TSNN achieves competitive performance on four datasets supplies no quantitative metrics, baseline specifications, error bars, statistical tests, or description of the matching distance function, rendering the central empirical result impossible to evaluate.

    Authors: We agree that the original presentation of results would benefit from greater quantitative detail to allow full evaluation. In the revised manuscript we have added a results table reporting MAE and RMSE for TSNN and the deep-learning baselines on all four datasets, with means and standard deviations computed over multiple random seeds. We also describe the matching distance function (Euclidean distance on z-score-normalized segments) and include paired t-test p-values to assess statistical significance of differences. These changes make the central empirical claims transparent and reproducible. revision: yes

  2. Referee: [Methodology] Memory bank construction and matching process (described in the methodology): no evidence is presented that nearest-neighbor matches from the training-only memory bank remain close or representative on test sequences. Missing are coverage statistics, nearest-neighbor distance distributions, or ablations on temporal splits that would address distribution shifts (incidents, holidays, sensor drift) in traffic data; this assumption is load-bearing for the non-parametric generalization claim.

    Authors: The referee correctly highlights a load-bearing assumption for any non-parametric method. In the revised manuscript we now report coverage statistics (fraction of test queries with matches below training-derived distance thresholds), side-by-side histograms of nearest-neighbor distances on training versus test data, and an ablation that removes holiday periods from the memory bank to quantify performance under temporal distribution shift. These additions support that periodicity allows the training memory bank to remain representative for typical traffic patterns, while we discuss rare events as a limitation. revision: yes

Circularity Check

0 steps flagged

No circularity: non-parametric lookup method with independent empirical validation

full rationale

The paper describes a non-parametric forecasting procedure that builds a fixed memory bank exclusively from training data via intra-set matching and then performs nearest-neighbor lookup on test queries. No trainable parameters exist, no equations define a quantity in terms of itself, and no self-citation chain is invoked to justify uniqueness or an ansatz. Performance claims rest on direct experimental comparison against deep-learning baselines on four datasets rather than on any derived identity. The generalization assumption (that training patterns suffice for test queries) is an empirical premise, not a definitional tautology; it can be falsified by distance or coverage statistics without altering the method's internal logic. Consequently the derivation chain contains no self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that traffic time series contain repeatable periodic structure that can be captured by nearest-neighbor matching in a training-derived memory bank; no free parameters are introduced and no new physical entities are postulated.

axioms (1)
  • domain assumption Traffic data exhibits sufficient periodicity and similarity across days or weeks that matching against a training memory bank yields useful forecasts.
    Stated explicitly in the abstract as the mechanism that enhances forecasting accuracy.
invented entities (1)
  • Memory bank no independent evidence
    purpose: Store and retrieve matched time-series segments from the training set for non-parametric forecasting.
    Constructed inside the proposed method; no independent evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5499 in / 1206 out tokens · 44549 ms · 2026-05-12T01:58:23.847268+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

  1. [1]

    Time-series forecasting with deep learning: a survey,

    B. Lim and S. Zohren, “Time-series forecasting with deep learning: a survey,”Philosophical Transactions of the Royal Society A, vol. 379, no. 2194, p. 20200209, 2021

  2. [2]

    Road traffic forecasting: Recent advances and new challenges,

    I. Lana, J. Del Ser, M. Velez, and E. I. Vlahogianni, “Road traffic forecasting: Recent advances and new challenges,”IEEE Intelligent Transportation Systems Magazine, vol. 10, no. 2, pp. 93–109, 2018

  3. [3]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI conference on artificial intel- ligence, vol. 35, pp. 11106–11115, 2021

  4. [4]

    Trend analysis of climate time series: A review of methods,

    M. Mudelsee, “Trend analysis of climate time series: A review of methods,”Earth-science reviews, vol. 190, pp. 310–322, 2019

  5. [5]

    Largest: A benchmark dataset for large- scale traffic forecasting,

    X. Liu, Y . Xia, Y . Liang, J. Hu, Y . Wang, L. Bai, C. Huang, Z. Liu, B. Hooi, and R. Zimmermann, “Largest: A benchmark dataset for large- scale traffic forecasting,”Advances in Neural Information Processing Systems, vol. 36, 2024

  6. [6]

    Urban traffic flow prediction techniques: A review,

    B. Medina-Salgado, E. S ´anchez-DelaCruz, P. Pozos-Parra, and J. E. Sierra, “Urban traffic flow prediction techniques: A review,”Sustainable Computing: Informatics and Systems, vol. 35, p. 100739, 2022

  7. [7]

    A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges,

    J. Kim, H. Kim, H. Kim, D. Lee, and S. Yoon, “A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges,”Artificial Intelligence Review, vol. 58, no. 7, pp. 1–95, 2025

  8. [8]

    Short term traffic forecasting using time series methods,

    C. Moorthy and B. Ratcliffe, “Short term traffic forecasting using time series methods,”Transportation planning and technology, vol. 12, no. 1, pp. 45–56, 1988

  9. [9]

    Application of subset autoregressive in- tegrated moving average model for short-term freeway traffic volume forecasting,

    S. Lee and D. B. Fambro, “Application of subset autoregressive in- tegrated moving average model for short-term freeway traffic volume forecasting,”Transportation research record, vol. 1678, no. 1, pp. 179– 188, 1999

  10. [10]

    Adaptive kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification,

    J. Guo, W. Huang, and B. M. Williams, “Adaptive kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification,”Transportation Research Part C: Emerging Technolo- gies, vol. 43, pp. 50–64, 2014

  11. [11]

    Travel-time prediction with sup- port vector regression,

    C.-H. Wu, J.-M. Ho, and D.-T. Lee, “Travel-time prediction with sup- port vector regression,”IEEE transactions on intelligent transportation systems, vol. 5, no. 4, pp. 276–281, 2004

  12. [12]

    Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,

    Y . Li, R. Yu, C. Shahabi, and Y . Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” inInternational Con- ference on Learning Representations, 2018

  13. [13]

    Graph wavenet for deep spatial-temporal graph model- ing.arXiv preprint arXiv:1906.00121,

    Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph wavenet for deep spatial-temporal graph modeling,”arXiv preprint arXiv:1906.00121, 2019

  14. [14]

    Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting.arXiv preprint arXiv:1709.04875,

    B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,”arXiv preprint arXiv:1709.04875, 2017

  15. [15]

    Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting,

    L. Cai, K. Janowicz, G. Mai, B. Yan, and R. Zhu, “Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting,”Transactions in GIS, vol. 24, no. 3, pp. 736–755, 2020

  16. [16]

    Lsttn: A long-short term transformer-based spatiotemporal neural network for traffic flow forecasting,

    Q. Luo, S. He, X. Han, Y . Wang, and H. Li, “Lsttn: A long-short term transformer-based spatiotemporal neural network for traffic flow forecasting,”Knowledge-Based Systems, vol. 293, p. 111637, 2024

  17. [17]

    Bidirectional spatial-temporal adaptive transformer for urban traffic flow forecasting,

    C. Chen, Y . Liu, L. Chen, and C. Zhang, “Bidirectional spatial-temporal adaptive transformer for urban traffic flow forecasting,”IEEE Trans- actions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 6913–6925, 2022

  18. [18]

    Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting,

    Z. Shao, Z. Zhang, F. Wang, W. Wei, and Y . Xu, “Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 4454–4458, 2022

  19. [19]

    Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,

    H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 5668–5675, 2019

  20. [20]

    Parameter is not all you need: Starting from non-parametric networks for 3d point cloud analysis,

    R. Zhang, L. Wang, Y . Wang, P. Gao, H. Li, and J. Shi, “Parameter is not all you need: Starting from non-parametric networks for 3d point cloud analysis,”arXiv preprint arXiv:2303.08134, 2023

  21. [21]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space,

    C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in neural information processing systems, vol. 30, 2017

  22. [22]

    Short-term traffic flow forecast- ing: An experimental comparison of time-series analysis and supervised learning,

    M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecast- ing: An experimental comparison of time-series analysis and supervised learning,”IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 871–882, 2013

  23. [23]

    Comparison of sarimax, sarima, modified sarima and ann-based models for short-term pv generation forecasting,

    S. I. Vagropoulos, G. Chouliaras, E. G. Kardakos, C. K. Simoglou, and A. G. Bakirtzis, “Comparison of sarimax, sarima, modified sarima and ann-based models for short-term pv generation forecasting,” in2016 IEEE international energy conference (ENERGYCON), pp. 1–6, IEEE, 2016

  24. [24]

    Vector autoregressions,

    J. H. Stock and M. W. Watson, “Vector autoregressions,”Journal of Economic perspectives, vol. 15, no. 4, pp. 101–115, 2001

  25. [25]

    Time series forecasting of petroleum pro- duction using deep lstm recurrent networks,

    A. Sagheer and M. Kotb, “Time series forecasting of petroleum pro- duction using deep lstm recurrent networks,”Neurocomputing, vol. 323, pp. 203–213, 2019

  26. [26]

    Financial time series forecasting model based on ceemdan and lstm,

    J. Cao, Z. Li, and J. Li, “Financial time series forecasting model based on ceemdan and lstm,”Physica A: Statistical mechanics and its applications, vol. 519, pp. 127–139, 2019

  27. [27]

    Energy load forecast- ing using deep learning approach-lstm and gru in spark cluster,

    S. Kumar, L. Hussain, S. Banarjee, and M. Reza, “Energy load forecast- ing using deep learning approach-lstm and gru in spark cluster,” in2018 fifth international conference on emerging applications of information technology (EAIT), pp. 1–4, IEEE, 2018

  28. [28]

    Lstm network: a deep learning approach for short-term traffic forecast,

    Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm network: a deep learning approach for short-term traffic forecast,”IET Intelligent Transport Systems, vol. 11, no. 2, pp. 68–75, 2017

  29. [29]

    Multi-lane short-term traffic forecasting with convolutional lstm network,

    Y . Ma, Z. Zhang, and A. Ihler, “Multi-lane short-term traffic forecasting with convolutional lstm network,”IEEE Access, vol. 8, pp. 34629– 34643, 2020

  30. [30]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  31. [31]

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Y . Wang, H. Wu, J. Dong, Y . Liu, M. Long, and J. Wang, “Deep time series models: A comprehensive survey and benchmark,”arXiv preprint arXiv:2407.13278, 2024

  32. [32]

    Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting,

    S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, and S. Dustdar, “Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting,” inInternational Conference on Learn- ing Representations, 2022

  33. [33]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

    H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in neural information processing systems, vol. 34, pp. 22419– 22430, 2021

  34. [34]

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

    Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,”arXiv preprint arXiv:2211.14730, 2022

  35. [35]

    Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,

    Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” inThe eleventh international conference on learning representations, 2023

  36. [36]

    Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis,

    Z. Shao, F. Wang, Y . Xu, W. Wei, C. Yu, Z. Zhang, D. Yao, T. Sun, G. Jin, X. Cao,et al., “Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis,” IEEE Transactions on Knowledge and Data Engineering, 2024

  37. [37]

    Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017

    Y . Li, R. Yu, C. Shahabi, and Y . Liu, “Diffusion convolutional re- current neural network: Data-driven traffic forecasting,”arXiv preprint arXiv:1707.01926, 2017

  38. [38]

    Gman: A graph multi-attention network for traffic prediction,

    C. Zheng, X. Fan, C. Wang, and J. Qi, “Gman: A graph multi-attention network for traffic prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 1234–1241, 2020

  39. [39]

    Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution,

    F. Li, J. Feng, H. Yan, G. Jin, F. Yang, F. Sun, D. Jin, and Y . Li, “Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution,”ACM Transactions on Knowledge Discovery from Data (TKDD), 2021

  40. [40]

    Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,

    Z. Shao, Z. Zhang, W. Wei, F. Wang, Y . Xu, X. Cao, and C. S. Jensen, “Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,”arXiv preprint arXiv:2206.09112, 2022

  41. [41]

    N., Carpov, D., Chapados, N., and Bengio, Y

    B. N. Oreshkin, D. Carpov, N. Chapados, and Y . Bengio, “N-beats: Neural basis expansion analysis for interpretable time series forecasting,” arXiv preprint arXiv:1905.10437, 2019

  42. [42]

    Are transformers effective for time series forecasting?,

    A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 11121–11128, 2023

  43. [43]

    Revisiting long-term time series forecasting: An investigation on linear mapping

    Z. Li, S. Qi, Y . Li, and Z. Xu, “Revisiting long-term time se- ries forecasting: An investigation on linear mapping,”arXiv preprint arXiv:2305.10721, 2023

  44. [44]

    Mixture-of-linear-experts for long- term time series forecasting,

    R. Ni, Z. Lin, S. Wang, and G. Fanti, “Mixture-of-linear-experts for long- term time series forecasting,” inInternational Conference on Artificial Intelligence and Statistics, pp. 4672–4680, PMLR, 2024

  45. [45]

    Testam: A time-enhanced spatio-temporal attention model with mixture of experts,

    H. Lee and S. Ko, “Testam: A time-enhanced spatio-temporal attention model with mixture of experts,”arXiv preprint arXiv:2403.02600, 2024

  46. [46]

    A time series is worth five experts: Heterogeneous mixture of experts for traffic flow prediction,

    G. Wang, Y . Chen, M. Gao, Z. Wu, J. Tang, and J. Zhao, “A time series is worth five experts: Heterogeneous mixture of experts for traffic flow prediction,”arXiv preprint arXiv:2409.17440, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

  47. [47]

    Making sgd parameter-free,

    Y . Carmon and O. Hinder, “Making sgd parameter-free,” inConference on Learning Theory, pp. 2360–2389, PMLR, 2022

  48. [48]

    Nonparametric variational auto-encoders for hierarchical representation learning,

    P. Goyal, Z. Hu, X. Liang, C. Wang, and E. P. Xing, “Nonparametric variational auto-encoders for hierarchical representation learning,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 5094–5102, 2017

  49. [49]

    Functional methods for time series prediction: a nonparametric approach,

    G. Aneiros-P ´erez, R. Cao, and J. M. Vilar-Fern ´andez, “Functional methods for time series prediction: a nonparametric approach,”Journal of Forecasting, vol. 30, no. 4, pp. 377–392, 2011

  50. [50]

    On estimating regression,

    E. A. Nadaraya, “On estimating regression,”Theory of Probability & Its Applications, vol. 9, no. 1, pp. 141–142, 1964

  51. [51]

    Smooth regression analysis,

    G. S. Watson, “Smooth regression analysis,”Sankhy ¯a: The Indian Journal of Statistics, Series A, pp. 359–372, 1964

  52. [52]

    Gaussian processes for regression,

    C. Williams and C. Rasmussen, “Gaussian processes for regression,” Advances in neural information processing systems, vol. 8, 1995

  53. [53]

    Population shape regression from random design data,

    B. C. Davis, P. T. Fletcher, E. Bullitt, and S. Joshi, “Population shape regression from random design data,”International journal of computer vision, vol. 90, pp. 255–266, 2010

  54. [54]

    Random forest regression for manifold-valued responses,

    D. Tsagkrasoulis and G. Montana, “Random forest regression for manifold-valued responses,”Pattern Recognition Letters, vol. 101, pp. 6– 13, 2018

  55. [55]

    Deep non-parametric time series forecaster,

    S. S. Rangapuram, J. Gasthaus, L. Stella, V . Flunkert, D. Salinas, Y . Wang, and T. Januschowski, “Deep non-parametric time series forecaster,”arXiv preprint arXiv:2312.14657, 2023

  56. [56]

    Unsupervised feature learning via non-parametric instance discrimination,

    Z. Wu, Y . Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742, 2018

  57. [57]

    Momentum contrast for unsupervised visual representation learning,

    K. He, H. Fan, Y . Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020

  58. [58]

    No time to train: Empowering non-parametric networks for few-shot 3d scene segmentation,

    X. Zhu, R. Zhang, B. He, Z. Guo, J. Liu, H. Xiao, C. Fu, H. Dong, and P. Gao, “No time to train: Empowering non-parametric networks for few-shot 3d scene segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3838–3847, 2024

  59. [59]

    Point-fcw: Transposed- fcw graph representation for point cloud classification using tda,

    H. Lai, B. Liu, C.-T. Lam, B. Ng, and S.-K. Im, “Point-fcw: Transposed- fcw graph representation for point cloud classification using tda,”IEEE Signal Processing Letters, 2025

  60. [60]

    Historical inertia: A neglected but pow- erful baseline for long sequence time-series forecasting,

    Y . Cui, J. Xie, and K. Zheng, “Historical inertia: A neglected but pow- erful baseline for long sequence time-series forecasting,” inProceedings of the 30th ACM international conference on information & knowledge management, pp. 2965–2969, 2021

  61. [61]

    Fast approximate correlation for massive time-series data,

    A. Mueen, S. Nath, and J. Liu, “Fast approximate correlation for massive time-series data,” inProceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 171–182, 2010

  62. [62]

    Combining multiple time series models through a robust weighted mechanism,

    R. Adhikari and R. Agrawal, “Combining multiple time series models through a robust weighted mechanism,” in2012 1st International Con- ference on Recent Advances in Information Technology (RAIT), pp. 455– 460, IEEE, 2012

  63. [63]

    Gy ¨orfi, M

    L. Gy ¨orfi, M. Kohler, A. Krzyzak, and H. Walk,A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006

  64. [64]

    Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting,

    C. Song, Y . Lin, S. Guo, and H. Wan, “Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 914–921, 2020

  65. [65]

    Basicts: A fair and scalable time series forecasting benchmark and toolkit

    “Basicts: A fair and scalable time series forecasting benchmark and toolkit..” https://github.com/GestaltCogTeam/BasicTS/. Accessed: 2025-10-16

  66. [66]

    Integrating granger causality and vector auto-regression for traffic prediction of large-scale wlans,

    Z. Lu, C. Zhou, J. Wu, H. Jiang, and S. Cui, “Integrating granger causality and vector auto-regression for traffic prediction of large-scale wlans,”KSII Transactions on Internet and Information Systems (TIIS), vol. 10, no. 1, pp. 136–151, 2016

  67. [67]

    Fits: Modeling time series with10k parameters,

    Z. Xu, A. Zeng, and Q. Xu, “Fits: Modeling time series with10k parameters,” inThe Twelfth International Conference on Learning Representations, 2024

  68. [68]

    St-norm: Spatial and temporal normalization for multi-variate time series forecasting,

    J. Deng, X. Chen, R. Jiang, X. Song, and I. W. Tsang, “St-norm: Spatial and temporal normalization for multi-variate time series forecasting,” inProceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 269–278, 2021

  69. [69]

    Visualizing data using t-sne.,

    L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008

  70. [70]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

  71. [71]

    A survey on mixture of experts,

    W. Cai, J. Jiang, F. Wang, J. Tang, S. Kim, and J. Huang, “A survey on mixture of experts,”arXiv preprint arXiv:2407.06204, 2024