pith. the verified trust layer for science. sign in

arxiv: 2509.23597 · v5 · pith:VOCMZGKMnew · submitted 2025-09-28 · 💻 cs.LG · cs.AI

Characteristic Root Analysis and Regularization for Linear Time Series Forecasting

Pith reviewed 2026-05-18 12:45 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords time series forecastinglinear modelscharacteristic rootsrank reductionregularizationRoot Purgenoise robustnesslatent dynamics
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{VOCMZGKM}

Prints a linked pith:VOCMZGKM badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Linear time series models produce spurious characteristic roots under noise that demand far more data to suppress unless countered by rank reduction or Root Purge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines linear models for time series forecasting by tracking how the characteristic roots of their weight matrices shape long-term predictions. In clean data these roots fully determine the behavior, and common design choices such as instance normalization or channel independence change which roots can be learned. When noise is present the models fit many extraneous roots whose influence shrinks only with disproportionately large training sets. The authors therefore introduce rank-reduction methods and a new adaptive procedure called Root Purge that encourages the model to maintain a noise-suppressing null space. If these claims hold, simple linear forecasters can recover the true low-dimensional dynamics more reliably and with less data than current practice suggests.

Core claim

In the noise-free regime the characteristic roots of the learned weight matrix govern the long-term temporal dynamics, and architectural decisions alter the set of representable roots. In the noisy regime the same models fit spurious roots that do not correspond to the underlying process; eliminating their effect requires training data that grows faster than the dimension of the noise. Structural regularization via reduced-rank regression or the Root Purge method recovers the low-dimensional latent dynamics without discarding essential forecasting information.

What carries the argument

Characteristic roots of the weight matrices, which encode the temporal evolution of the linear forecaster and become contaminated by noise.

If this is right

  • Rank reduction recovers the true low-dimensional latent dynamics that linear models otherwise obscure with spurious roots.
  • Root Purge learns a noise-suppressing null space during training and improves data efficiency.
  • Instance normalization and channel independence change the roots a linear model can represent in the clean case.
  • Classical linear-system theory can be combined with modern training to produce more robust and interpretable forecasters.
  • The same root-analysis lens explains why simple linear models remain competitive on many real benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same root-spuriousness mechanism may explain why over-parameterized sequence models sometimes require very large datasets to stabilize long-horizon forecasts.
  • Root Purge could be adapted as a regularizer for other linear layers inside larger neural architectures.
  • Testing whether the number of retained roots after regularization correlates with forecast horizon stability would provide a practical diagnostic.
  • The framework suggests a route to parameter-free selection of model capacity by counting the stable roots needed for a given dataset.

Load-bearing premise

The temporal dynamics of any linear forecaster are completely captured by the roots of its weight matrix, so that discarding some roots or reducing matrix rank leaves the essential forecasting information intact.

What would settle it

A controlled experiment on synthetic data with known low-dimensional linear dynamics plus additive noise where the forecasting error after Root Purge or rank reduction remains high even when training data is increased to several times the dimension of the noise.

Figures

Figures reproduced from arXiv: 2509.23597 by Kaixuan Zhang, Longyuan Li, Tobias Schlagenhauf, Wanfang Chen, Xiaonan Lu, Zheng Wang.

Figure 1
Figure 1. Figure 1: Structure of the paper and its main contributions. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average forecasting MSE on ETTh1 and ETTm1 across horizons [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: First 336 singular value magnitudes on ETTh1 and ETTm1 under different values of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Data scaling and noise robustness of state-of-the-art linear time-series models. (left) RRR [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Full road map of the paper and its main contributions (full scale). [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mind map for common notations we used in our paper for a time series dataset. A more [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of CD, INC, and CI for linear time series models. We color [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Generalization test of W forecasting the time series y(t) = 0.01t 2 + sin t. Left: success of generalization to x(t) = t + cost. Right: failure of generalization to z(t) = cos(1.1t). C.4 INSTANCE NORMALIZATION AND CHANNEL INDEPENDENT MODELING In this section, we discuss in detail how Instance Normalization (IN), Channel Independent (CI) Modeling, and other channel modeling methods (CD, INC) fits in our fra… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative visualization of an OLS model forecasting pure noise. Since noise is inherently [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative demonstration of forecasting performance on pure noise. Models fitted with [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean Squared Error on pure noise series of varying sizes. Consistent with Proposition 1, [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Overall, the hyperparameter sensitivities and singular value trend for Root Purge in the [PITH_FULL_IMAGE:figures/full_fig_p047_13.png] view at source ↗
Figure 12
Figure 12. Figure 12: (Time Domain) Average forecasting MSE on ETTh1 and ETTm1 across horizons H = {96, 192, 336, 720} for different values of λ. The hyperparameter sensitivity of the time-domain linear model is similar to that of the frequency-domain linear model. As for the frequency-domain linear model, a break-down table for each horizon for the time-domain linear model is in [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: (Time Domain) First 336 singular value magnitudes on ETTh1 and ETTm1 under different values of λ (log scale) with W. As λ increases, Root Purge pushes the weight matrix W to have more smaller singular values, while the significant singular values remain largely unaffected. The overall effect of λ on singular values is consistent with what we found in the main text, where W is learned in the frequency doma… view at source ↗
Figure 14
Figure 14. Figure 14: RRR Rank-MSE Trade-off on 4 example datasets. (a) & (b): On smaller datasets, there [PITH_FULL_IMAGE:figures/full_fig_p051_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: RRR Rank-MSE Trade-off compared to DWRR on ETTh2. The validation trade-off curve [PITH_FULL_IMAGE:figures/full_fig_p051_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Additional visualization of the data scaling & noise robustness on simulation data. (a) A [PITH_FULL_IMAGE:figures/full_fig_p054_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: The roots obtained via RRR and Root Purge are visibly closer to the ground truth than [PITH_FULL_IMAGE:figures/full_fig_p055_17.png] view at source ↗
Figure 17
Figure 17. Figure 17: Visualization of root distribution on the (a) [PITH_FULL_IMAGE:figures/full_fig_p056_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Rank-MSE Trade-off Curves on ETTh1 20 40 60 80 100 Rank of W 0.269 0.270 0.271 0.272 0.273 Test MSE Test MSE Val MSE 0.208 0.209 0.210 0.211 0.212 0.213 Val MSE RRR ETTh2_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.3290 0.3295 0.3300 0.3305 0.3310 0.3315 0.3320 Test MSE Test MSE Val MSE 0.275 0.276 0.277 0.278 0.279 Val MSE RRR ETTh2_L720_F192 Rank-MSE Trade-off 0 50 100 150 200… view at source ↗
Figure 19
Figure 19. Figure 19: Rank-MSE Trade-off Curves on ETTh2 20 40 60 80 100 Rank of W 0.3075 0.3100 0.3125 0.3150 0.3175 0.3200 0.3225 Test MSE Test MSE Val MSE 0.390 0.392 0.394 0.396 0.398 Val MSE RRR ETTm1_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.336 0.338 0.340 0.342 0.344 0.346 0.348 Test MSE Test MSE Val MSE 0.508 0.509 0.510 0.511 0.512 0.513 0.514 0.515 Val MSE RRR ETTm1_L720_F192 Rank-MSE Tr… view at source ↗
Figure 20
Figure 20. Figure 20: Rank-MSE Trade-off Curves on ETTm1 58 [PITH_FULL_IMAGE:figures/full_fig_p058_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Rank-MSE Trade-off Curves on ETTm2 20 40 60 80 100 Rank of W 0.1403 0.1404 0.1405 0.1406 0.1407 0.1408 0.1409 0.1410 0.1411 Test MSE Test MSE Val MSE 0.3720 0.3725 0.3730 0.3735 0.3740 0.3745 Val MSE RRR Weather_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.1824 0.1826 0.1828 0.1830 0.1832 0.1834 Test MSE Test MSE Val MSE 0.4295 0.4300 0.4305 0.4310 0.4315 0.4320 Val MSE RRR Weath… view at source ↗
Figure 22
Figure 22. Figure 22: Rank-MSE Trade-off Curves on Weather 20 40 60 80 100 Rank of W 0.08385 0.08390 0.08395 0.08400 0.08405 0.08410 0.08415 Test MSE Test MSE Val MSE 0.13680 0.13685 0.13690 0.13695 0.13700 0.13705 Val MSE RRR Exchange_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.1740 0.1741 0.1742 0.1743 0.1744 0.1745 Test MSE Test MSE Val MSE 0.2301 0.2302 0.2303 0.2304 0.2305 0.2306 0.2307 0.2308 V… view at source ↗
Figure 23
Figure 23. Figure 23: Rank-MSE Trade-off Curves on Exchange 59 [PITH_FULL_IMAGE:figures/full_fig_p059_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Rank-MSE Trade-off Curves on Electricity [PITH_FULL_IMAGE:figures/full_fig_p060_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Rank-MSE Trade-off Curves on Traffic 60 [PITH_FULL_IMAGE:figures/full_fig_p060_25.png] view at source ↗
read the original abstract

Time series forecasting remains a critical challenge across numerous domains, yet the effectiveness of complex models often varies unpredictably across datasets. Recent studies highlight the surprising competitiveness of simple linear models, suggesting that their robustness and interpretability warrant deeper theoretical investigation. This paper presents a systematic study of linear models for time series forecasting, with a focus on the role of characteristic roots in temporal dynamics. We begin by analyzing the noise-free setting, where we show that characteristic roots govern long-term behavior and explain how design choices such as instance normalization and channel independence affect model capabilities. We then extend our analysis to the noisy regime, revealing that models tend to produce spurious roots. This leads to the identification of a key data-scaling property: mitigating the influence of noise requires disproportionately large training data, highlighting the need for structural regularization. To address these challenges, we propose two complementary strategies for robust root restructuring. The first uses rank reduction techniques, including \textbf{Reduced-Rank Regression (RRR)} and \textbf{Direct Weight Rank Reduction (DWRR)}, to recover the low-dimensional latent dynamics. The second, a novel adaptive method called \textbf{Root Purge}, encourages the model to learn a noise-suppressing null space during training. Extensive experiments on standard benchmarks demonstrate the effectiveness of both approaches, validating our theoretical insights and achieving state-of-the-art results in several settings. Our findings underscore the potential of integrating classical theories for linear systems with modern learning techniques to build robust, interpretable, and data-efficient forecasting models. The code is publicly available at: https://github.com/Wangzzzzzzzz/RootPurge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that characteristic roots of linear forecasting model weight matrices govern long-term temporal dynamics in the noise-free regime (with effects from instance normalization and channel independence), while in the noisy regime models produce spurious roots whose mitigation requires disproportionately large training data; it proposes rank-reduction methods (RRR and DWRR) plus a novel adaptive Root Purge regularizer to recover low-dimensional latent dynamics, with experiments showing improved benchmark performance and SOTA results in some settings.

Significance. If the noise-free/noisy-regime distinction and the data-scaling property hold, the work usefully integrates classical linear-systems theory with modern forecasting practice, offering interpretable structural regularization that could improve data efficiency and robustness. Public code release aids reproducibility and allows direct verification of the empirical claims.

major comments (3)
  1. [§4] §4 (noisy-regime analysis): the data-scaling claim—that mitigating spurious roots requires disproportionately large training data—is presented without visible derivation, error bounds, or quantification details (e.g., how scaling exponents were estimated or whether post-hoc dataset selection was controlled), leaving the central motivation for structural regularization under-supported.
  2. [§5] §5 (rank-reduction methods): the premise that RRR/DWRR recover the true low-dimensional latent dynamics without discarding essential finite-horizon predictive information is load-bearing yet only weakly justified; the noise-free analysis focuses on long-term root behavior, but the manuscript does not demonstrate that transient or noise-robust components removed by rank reduction contribute negligibly to the actual forecasting loss.
  3. [§6] §6 (Root Purge): the adaptive null-space mechanism is introduced without a proof sketch or ablation showing it avoids the data-scaling issue identified in the noisy regime; it is unclear whether the learned null space preserves short-horizon accuracy or merely suppresses long-term spurious roots.
minor comments (2)
  1. [Abstract] Abstract: the statement of 'state-of-the-art results in several settings' should name the specific datasets and baselines for immediate context.
  2. [Notation] Notation: the precise definition of characteristic roots for the learned weight matrices (especially under channel independence) should be stated explicitly before the noise-free analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments on our manuscript. We have carefully considered each point and provide point-by-point responses below. We plan to incorporate several clarifications and additional analyses in the revised version to address the concerns raised.

read point-by-point responses
  1. Referee: [§4] §4 (noisy-regime analysis): the data-scaling claim—that mitigating spurious roots requires disproportionately large training data—is presented without visible derivation, error bounds, or quantification details (e.g., how scaling exponents were estimated or whether post-hoc dataset selection was controlled), leaving the central motivation for structural regularization under-supported.

    Authors: We thank the referee for highlighting this issue. The data-scaling claim is supported by empirical evidence from our experiments on synthetic and real datasets, where we observed the need for significantly larger training sets to reduce the influence of spurious roots. However, we agree that a more rigorous derivation or quantification would strengthen the motivation. In the revision, we will include a detailed description of the experimental setup for estimating scaling behavior, including how exponents were computed and controls for dataset selection. We will also discuss the theoretical intuition behind the scaling property based on the analysis of the noisy regime. revision: yes

  2. Referee: [§5] §5 (rank-reduction methods): the premise that RRR/DWRR recover the true low-dimensional latent dynamics without discarding essential finite-horizon predictive information is load-bearing yet only weakly justified; the noise-free analysis focuses on long-term root behavior, but the manuscript does not demonstrate that transient or noise-robust components removed by rank reduction contribute negligibly to the actual forecasting loss.

    Authors: We acknowledge that the justification for preserving predictive information under rank reduction could be more explicit. Our analysis in the noise-free setting demonstrates that the characteristic roots capture the essential long-term dynamics, and rank reduction is designed to focus on the dominant low-rank structure. To better address finite-horizon concerns, we will add experiments in the revision that quantify the contribution of the discarded components to the forecasting loss on short horizons, showing that they are indeed negligible in the contexts we consider. This will include comparisons of loss with and without rank reduction on transient behaviors. revision: yes

  3. Referee: [§6] §6 (Root Purge): the adaptive null-space mechanism is introduced without a proof sketch or ablation showing it avoids the data-scaling issue identified in the noisy regime; it is unclear whether the learned null space preserves short-horizon accuracy or merely suppresses long-term spurious roots.

    Authors: We appreciate the referee's point on the need for stronger theoretical and empirical support for Root Purge. The method is motivated by encouraging the model to learn a null space that suppresses noise-induced roots while maintaining the core dynamics. In the revised manuscript, we will provide a proof sketch outlining why the adaptive null-space mechanism mitigates the data-scaling requirement, along with additional ablations that evaluate short-horizon accuracy and the balance between suppressing spurious roots and preserving predictive performance. These will demonstrate that short-horizon accuracy is maintained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; analysis rests on standard linear algebra

full rationale

The paper derives the role of characteristic roots from the standard theory of linear recurrence relations applied to the weight matrices of linear forecasting models. This is independent of the target forecasting performance and does not reduce any prediction to a fitted quantity defined by the result itself. The noisy-regime analysis and regularization proposals (RRR, DWRR, Root Purge) follow from this foundation plus empirical scaling observations, without load-bearing self-citations or ansatzes smuggled from prior author work. The derivation chain remains self-contained against external linear algebra benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard linear algebra for characteristic roots and the modeling assumption that rank reduction preserves essential dynamics; no new entities are postulated.

free parameters (1)
  • reduced rank parameter
    The target rank in RRR and DWRR is a modeling choice that must be selected or tuned for each dataset.
axioms (1)
  • standard math The weight matrix of a linear time series model admits a characteristic root decomposition that governs its long-term iterative behavior.
    Invoked when analyzing noise-free long-term dynamics and design choices such as instance normalization.

pith-pipeline@v0.9.0 · 5835 in / 1281 out tokens · 52777 ms · 2026-05-18T12:45:07.108185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 3 internal anchors

  1. [1]

    Community detection and stochastic block models: recent developments

    Emmanuel Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18 0 (177): 0 1--86, 2018

  2. [2]

    No free lunch theorem: A review

    Stavros P Adam, Stamatios-Aggelos N Alexandropoulos, Panos M Pardalos, and Michael N Vrahatis. No free lunch theorem: A review. Approximation and optimization: Algorithms, complexity and applications, pages 57--82, 2019

  3. [3]

    Model agnostic time series analysis via matrix estimation

    Anish Agarwal, Muhammad Jehangir Amjad, Devavrat Shah, and Dennis Shen. Model agnostic time series analysis via matrix estimation. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2 0 (3): 0 1--39, 2018

  4. [4]

    Linear algebra done right

    Sheldon Axler. Linear algebra done right. Springer, 2015

  5. [5]

    Position: There are no champions in long-term time series forecasting

    Lorenzo Brigato, Rafael Morand, Knut Str mmen, Maria Panagiotou, Markus Schmidt, and Stavroula Mougiakakou. Position: There are no champions in long-term time series forecasting. arXiv preprint arXiv:2502.14045, 2025

  6. [6]

    Modern koopman theory for dynamical systems,

    Steven L Brunton, Marko Budi s i \'c , Eurika Kaiser, and J Nathan Kutz. Modern koopman theory for dynamical systems. arXiv preprint arXiv:2102.12086, 2021

  7. [7]

    Applied koopmanism

    Marko Budi s i \'c , Ryan Mohr, and Igor Mezi \'c . Applied koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22 0 (4), 2012

  8. [8]

    A new analysis technique for time series data

    John Parker Burg. A new analysis technique for time series data. Paper presented at NATO Advanced Study Institute on Signal Processing, Enschede, Netherlands, 1968, 1968

  9. [9]

    Spectral method and regularized mle are both optimal for top-k ranking

    Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. Spectral method and regularized mle are both optimal for top-k ranking. Annals of statistics, 47 0 (4): 0 2204, 2019

  10. [10]

    Spectral methods for data science: A statistical perspective

    Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends in Machine Learning , 14 0 (5): 0 566--806, 2021

  11. [11]

    Long-term forecasting with tide: Time-series dense encoder

    Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tide: Time-series dense encoder, 2024. URL https://arxiv.org/abs/2304.08424

  12. [12]

    The rotation of eigenvectors by a perturbation

    Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7 0 (1): 0 1--46, 1970

  13. [13]

    The fitting of time-series models

    James Durbin. The fitting of time-series models. Revue de l'Institut International de Statistique, pages 233--244, 1960

  14. [14]

    Tslanet: Rethinking transformers for time series representation learning, 2024

    Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, and Xiaoli Li. Tslanet: Rethinking transformers for time series representation learning, 2024. URL https://arxiv.org/abs/2404.08472

  15. [15]

    On inverses of vandermonde and confluent vandermonde matrices

    Walter Gautschi. On inverses of vandermonde and confluent vandermonde matrices. Numer. Math., 4 0 (1): 0 117–123, December 1962. ISSN 0029-599X. doi:10.1007/BF01386302. URL https://doi.org/10.1007/BF01386302

  16. [16]

    Optimal shrinkage of singular values

    Matan Gavish and David L Donoho. Optimal shrinkage of singular values. IEEE Transactions on Information Theory, 63 0 (4): 0 2137--2152, 2017

  17. [17]

    Time series analysis

    James D Hamilton. Time series analysis. Princeton university press, 2020

  18. [18]

    Singular spectrum analysis: methodology and comparison

    Hossein Hassani. Singular spectrum analysis: methodology and comparison. 2007

  19. [19]

    Low rank regularization: A review

    Zhanxuan Hu, Feiping Nie, Rong Wang, and Xuelong Li. Low rank regularization: A review. Neural Networks, 136: 0 218--232, 2021

  20. [20]

    Timebase: The power of minimalism in efficient long-term time series forecasting

    Qihe Huang, Zhengyang Zhou, Kuo Yang, Zhongchao Yi, Xu Wang, and Yang Wang. Timebase: The power of minimalism in efficient long-term time series forecasting. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=GhTdNOMfOD

  21. [21]

    Reduced-rank regression for the multivariate linear model

    Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5 0 (2): 0 248--264, 1975. ISSN 0047-259X. doi:https://doi.org/10.1016/0047-259X(75)90042-1. URL https://www.sciencedirect.com/science/article/pii/0047259X75900421

  22. [22]

    Spectral algorithms

    Ravindran Kannan, Santosh Vempala, et al. Spectral algorithms. Foundations and Trends in Theoretical Computer Science , 4 0 (3--4): 0 157--288, 2009

  23. [23]

    Matrix completion from a few entries

    Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries. IEEE transactions on information theory, 56 0 (6): 0 2980--2998, 2010

  24. [24]

    Literature survey on low rank approximation of matrices

    N Kishore Kumar and Jan Schneider. Literature survey on low rank approximation of matrices. Linear and Multilinear Algebra, 65 0 (11): 0 2212--2244, 2017

  25. [25]

    \"U ber die analytischen methoden in der wahrscheinlichkeitsrechnung

    Andrei Kolmogoroff. \"U ber die analytischen methoden in der wahrscheinlichkeitsrechnung. Mathematische Annalen, 104: 0 415--458, 1931

  26. [26]

    Deep learning for time series forecasting: a survey

    Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series forecasting: a survey. International Journal of Machine Learning and Cybernetics, pages 1--34, 2025

  27. [27]

    Affine rank minimization via asymptotic log-det iteratively reweighted least squares

    Sebastian Kr \"a mer. Affine rank minimization via asymptotic log-det iteratively reweighted least squares. Journal of Machine Learning Research, 26 0 (92): 0 1--44, 2025

  28. [28]

    H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2 0 (1-2): 0 83--97, 1955. doi:https://doi.org/10.1002/nav.3800020109. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800020109

  29. [29]

    Revisiting long-term time series forecasting: An investigation on linear mapping

    Zhe Li, Shiyi Qi, Yiduo Li, and Zenglin Xu. Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv preprint arXiv:2305.10721, 2023

  30. [30]

    Time-series forecasting with deep learning: a survey

    Bryan Lim and Stefan Zohren. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379 0 (2194): 0 20200209, 2021

  31. [31]

    Sparsetsf: Modeling long-term time series forecasting with 1k parameters, 2024

    Shengsheng Lin, Weiwei Lin, Wentai Wu, Haojun Chen, and Junjie Yang. Sparsetsf: Modeling long-term time series forecasting with 1k parameters, 2024. URL https://arxiv.org/abs/2405.00946

  32. [32]

    Koopa: Learning non-stationary time series dynamics with koopman predictors

    Yong Liu, Chenyu Li, Jianmin Wang, and Mingsheng Long. Koopa: Learning non-stationary time series dynamics with koopman predictors. Advances in neural information processing systems, 36: 0 12271--12290, 2023

  33. [33]

    iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

    Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting, 2024. URL https://arxiv.org/abs/2310.06625

  34. [34]

    The m4 competition: 100,000 time series and 61 forecasting methods

    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The m4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36 0 (1): 0 54--74, 2020. ISSN 0169-2070. doi:https://doi.org/10.1016/j.ijforecast.2019.04.014. URL https://www.sciencedirect.com/science/article/pii/S0169207019301128. M4 Competition

  35. [35]

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

    Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022

  36. [36]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

  37. [37]

    Foundations of time series analysis and prediction theory

    Mohsen Pourahmadi. Foundations of time series analysis and prediction theory. John Wiley & Sons, 2001

  38. [38]

    Linear recursive feature machines provably recover low-rank matrices

    Adityanarayanan Radhakrishnan, Mikhail Belkin, and Dmitriy Drusvyatskiy. Linear recursive feature machines provably recover low-rank matrices. Proceedings of the National Academy of Sciences, 122 0 (13): 0 e2411325122, 2025

  39. [39]

    Dynamic mode decomposition of numerical and experimental data

    Peter J Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656: 0 5--28, 2010

  40. [40]

    Scaling law for time series forecasting

    Jingzhe Shi, Qinwei Ma, Huan Ma, and Lei Li. Scaling law for time series forecasting. arXiv preprint arXiv:2405.15124, 2024

  41. [41]

    Time series analysis and its applications: with R examples

    Robert H Shumway and David S Stoffer. Time series analysis and its applications: with R examples. Springer, 2006

  42. [42]

    Introduction to the galois theory of linear differential equations

    Michael F Singer. Introduction to the galois theory of linear differential equations. Algebraic theory of differential equations, 357: 0 1--82, 2009

  43. [43]

    Topics in random matrix theory, volume 132

    Terence Tao. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012

  44. [44]

    The low-rank hypothesis of complex systems

    Vincent Thibeault, Antoine Allard, and Patrick Desrosiers. The low-rank hypothesis of complex systems. Nature Physics, 20 0 (2): 0 294--302, 2024

  45. [45]

    An analysis of linear time series forecasting models

    William Toner and Luke Darlow. An analysis of linear time series forecasting models. arXiv preprint arXiv:2403.14587, 2024

  46. [46]

    An introduction to matrix concentration inequalities

    Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning , 8 0 (1-2): 0 1--230, 2015

  47. [47]

    The theory of linear prediction

    Palghat P Vaidyanathan. The theory of linear prediction. Morgan & Claypool Publishers, 2007

  48. [48]

    Galois theory of linear differential equations, volume 328

    Marius Van der Put and Michael F Singer. Galois theory of linear differential equations, volume 328. Springer Science & Business Media, 2012

  49. [49]

    High-dimensional statistics: A non-asymptotic viewpoint, volume 48

    Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

  50. [50]

    Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung)

    Hermann Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen, 71 0 (4): 0 441--479, 1912

  51. [51]

    The interpolation, extrapolation and smoothing of stationary time series

    N Wiener. The interpolation, extrapolation and smoothing of stationary time series. ndrc report, 1942

  52. [52]

    A data--driven approximation of the koopman operator: Extending dynamic mode decomposition

    Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data--driven approximation of the koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25 0 (6): 0 1307--1346, 2015

  53. [53]

    Learning deep time-index models for time series forecasting

    Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. Learning deep time-index models for time series forecasting. In International Conference on Machine Learning, pages 37217--37237. PMLR, 2023

  54. [54]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022. URL https://arxiv.org/abs/2106.13008

  55. [55]

    TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023. URL https://arxiv.org/abs/2210.02186

  56. [56]

    Fits: Modeling time series with 10k parameters, 2024

    Zhijian Xu, Ailing Zeng, and Qiang Xu. Fits: Modeling time series with 10 k parameters. arXiv preprint arXiv:2307.03756, 2023

  57. [57]

    Frequency-domain mlps are more effective learners in time series forecasting, 2023

    Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Defu Lian, Ning An, Longbing Cao, and Zhendong Niu. Frequency-domain mlps are more effective learners in time series forecasting, 2023. URL https://arxiv.org/abs/2311.06184

  58. [58]

    Filternet: Harnessing frequency filters for time series forecasting, 2024

    Kun Yi, Jingru Fei, Qi Zhang, Hui He, Shufeng Hao, Defu Lian, and Wei Fan. Filternet: Harnessing frequency filters for time series forecasting, 2024. URL https://arxiv.org/abs/2411.01623

  59. [59]

    Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121--11128, 2023

    Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121--11128, 2023

  60. [60]

    Skolr: Structured koopman operator linear rnn for time-series forecasting

    Yitian Zhang, Liheng Ma, Antonios Valkanas, Boris N Oreshkin, and Mark Coates. Skolr: Structured koopman operator linear rnn for time-series forecasting. arXiv preprint arXiv:2506.14113, 2025

  61. [61]

    Singular spectrum analysis for time series: Introduction to this special issue

    Anatoly Alexandrovich Zhigljavsky. Singular spectrum analysis for time series: Introduction to this special issue. Statistics and its Interface, 3 0 (3): 0 255--258, 2010

  62. [62]

    Informer: Beyond efficient transformer for long sequence time-series forecasting

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference , volume 35, pages 11106--11115. AAAI Press, 2021

  63. [63]

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, pages 27268--27286. PMLR, 2022

  64. [64]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  65. [65]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  66. [66]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...