arxiv: 2509.23597 · v5 · pith:VOCMZGKMnew · submitted 2025-09-28 · 💻 cs.LG · cs.AI

Characteristic Root Analysis and Regularization for Linear Time Series Forecasting

Zheng Wang , Kaixuan Zhang , Wanfang Chen , Xiaonan Lu , Longyuan Li , Tobias Schlagenhauf This is my paper

Pith reviewed 2026-05-18 12:45 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords time series forecastinglinear modelscharacteristic rootsrank reductionregularizationRoot Purgenoise robustnesslatent dynamics

0 comments p. Extension

Add this Pith Number to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{VOCMZGKM}

Prints a linked pith:VOCMZGKM badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Linear time series models produce spurious characteristic roots under noise that demand far more data to suppress unless countered by rank reduction or Root Purge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines linear models for time series forecasting by tracking how the characteristic roots of their weight matrices shape long-term predictions. In clean data these roots fully determine the behavior, and common design choices such as instance normalization or channel independence change which roots can be learned. When noise is present the models fit many extraneous roots whose influence shrinks only with disproportionately large training sets. The authors therefore introduce rank-reduction methods and a new adaptive procedure called Root Purge that encourages the model to maintain a noise-suppressing null space. If these claims hold, simple linear forecasters can recover the true low-dimensional dynamics more reliably and with less data than current practice suggests.

Core claim

In the noise-free regime the characteristic roots of the learned weight matrix govern the long-term temporal dynamics, and architectural decisions alter the set of representable roots. In the noisy regime the same models fit spurious roots that do not correspond to the underlying process; eliminating their effect requires training data that grows faster than the dimension of the noise. Structural regularization via reduced-rank regression or the Root Purge method recovers the low-dimensional latent dynamics without discarding essential forecasting information.

What carries the argument

Characteristic roots of the weight matrices, which encode the temporal evolution of the linear forecaster and become contaminated by noise.

If this is right

Rank reduction recovers the true low-dimensional latent dynamics that linear models otherwise obscure with spurious roots.
Root Purge learns a noise-suppressing null space during training and improves data efficiency.
Instance normalization and channel independence change the roots a linear model can represent in the clean case.
Classical linear-system theory can be combined with modern training to produce more robust and interpretable forecasters.
The same root-analysis lens explains why simple linear models remain competitive on many real benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same root-spuriousness mechanism may explain why over-parameterized sequence models sometimes require very large datasets to stabilize long-horizon forecasts.
Root Purge could be adapted as a regularizer for other linear layers inside larger neural architectures.
Testing whether the number of retained roots after regularization correlates with forecast horizon stability would provide a practical diagnostic.
The framework suggests a route to parameter-free selection of model capacity by counting the stable roots needed for a given dataset.

Load-bearing premise

The temporal dynamics of any linear forecaster are completely captured by the roots of its weight matrix, so that discarding some roots or reducing matrix rank leaves the essential forecasting information intact.

What would settle it

A controlled experiment on synthetic data with known low-dimensional linear dynamics plus additive noise where the forecasting error after Root Purge or rank reduction remains high even when training data is increased to several times the dimension of the noise.

Figures

Figures reproduced from arXiv: 2509.23597 by Kaixuan Zhang, Longyuan Li, Tobias Schlagenhauf, Wanfang Chen, Xiaonan Lu, Zheng Wang.

**Figure 2.** Figure 2: Average forecasting MSE on ETTh1 and ETTm1 across horizons [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: First 336 singular value magnitudes on ETTh1 and ETTm1 under different values of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Data scaling and noise robustness of state-of-the-art linear time-series models. (left) RRR [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Full road map of the paper and its main contributions (full scale). [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Mind map for common notations we used in our paper for a time series dataset. A more [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of CD, INC, and CI for linear time series models. We color [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Generalization test of W forecasting the time series y(t) = 0.01t 2 + sin t. Left: success of generalization to x(t) = t + cost. Right: failure of generalization to z(t) = cos(1.1t). C.4 INSTANCE NORMALIZATION AND CHANNEL INDEPENDENT MODELING In this section, we discuss in detail how Instance Normalization (IN), Channel Independent (CI) Modeling, and other channel modeling methods (CD, INC) fits in our fra… view at source ↗

**Figure 9.** Figure 9: Qualitative visualization of an OLS model forecasting pure noise. Since noise is inherently [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative demonstration of forecasting performance on pure noise. Models fitted with [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Mean Squared Error on pure noise series of varying sizes. Consistent with Proposition 1, [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 13.** Figure 13: Overall, the hyperparameter sensitivities and singular value trend for Root Purge in the [PITH_FULL_IMAGE:figures/full_fig_p047_13.png] view at source ↗

**Figure 12.** Figure 12: (Time Domain) Average forecasting MSE on ETTh1 and ETTm1 across horizons H = {96, 192, 336, 720} for different values of λ. The hyperparameter sensitivity of the time-domain linear model is similar to that of the frequency-domain linear model. As for the frequency-domain linear model, a break-down table for each horizon for the time-domain linear model is in [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗

**Figure 13.** Figure 13: (Time Domain) First 336 singular value magnitudes on ETTh1 and ETTm1 under different values of λ (log scale) with W. As λ increases, Root Purge pushes the weight matrix W to have more smaller singular values, while the significant singular values remain largely unaffected. The overall effect of λ on singular values is consistent with what we found in the main text, where W is learned in the frequency doma… view at source ↗

**Figure 14.** Figure 14: RRR Rank-MSE Trade-off on 4 example datasets. (a) & (b): On smaller datasets, there [PITH_FULL_IMAGE:figures/full_fig_p051_14.png] view at source ↗

**Figure 15.** Figure 15: RRR Rank-MSE Trade-off compared to DWRR on ETTh2. The validation trade-off curve [PITH_FULL_IMAGE:figures/full_fig_p051_15.png] view at source ↗

**Figure 16.** Figure 16: Additional visualization of the data scaling & noise robustness on simulation data. (a) A [PITH_FULL_IMAGE:figures/full_fig_p054_16.png] view at source ↗

**Figure 17.** Figure 17: The roots obtained via RRR and Root Purge are visibly closer to the ground truth than [PITH_FULL_IMAGE:figures/full_fig_p055_17.png] view at source ↗

**Figure 17.** Figure 17: Visualization of root distribution on the (a) [PITH_FULL_IMAGE:figures/full_fig_p056_17.png] view at source ↗

**Figure 18.** Figure 18: Rank-MSE Trade-off Curves on ETTh1 20 40 60 80 100 Rank of W 0.269 0.270 0.271 0.272 0.273 Test MSE Test MSE Val MSE 0.208 0.209 0.210 0.211 0.212 0.213 Val MSE RRR ETTh2_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.3290 0.3295 0.3300 0.3305 0.3310 0.3315 0.3320 Test MSE Test MSE Val MSE 0.275 0.276 0.277 0.278 0.279 Val MSE RRR ETTh2_L720_F192 Rank-MSE Trade-off 0 50 100 150 200… view at source ↗

**Figure 19.** Figure 19: Rank-MSE Trade-off Curves on ETTh2 20 40 60 80 100 Rank of W 0.3075 0.3100 0.3125 0.3150 0.3175 0.3200 0.3225 Test MSE Test MSE Val MSE 0.390 0.392 0.394 0.396 0.398 Val MSE RRR ETTm1_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.336 0.338 0.340 0.342 0.344 0.346 0.348 Test MSE Test MSE Val MSE 0.508 0.509 0.510 0.511 0.512 0.513 0.514 0.515 Val MSE RRR ETTm1_L720_F192 Rank-MSE Tr… view at source ↗

**Figure 20.** Figure 20: Rank-MSE Trade-off Curves on ETTm1 58 [PITH_FULL_IMAGE:figures/full_fig_p058_20.png] view at source ↗

**Figure 21.** Figure 21: Rank-MSE Trade-off Curves on ETTm2 20 40 60 80 100 Rank of W 0.1403 0.1404 0.1405 0.1406 0.1407 0.1408 0.1409 0.1410 0.1411 Test MSE Test MSE Val MSE 0.3720 0.3725 0.3730 0.3735 0.3740 0.3745 Val MSE RRR Weather_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.1824 0.1826 0.1828 0.1830 0.1832 0.1834 Test MSE Test MSE Val MSE 0.4295 0.4300 0.4305 0.4310 0.4315 0.4320 Val MSE RRR Weath… view at source ↗

**Figure 22.** Figure 22: Rank-MSE Trade-off Curves on Weather 20 40 60 80 100 Rank of W 0.08385 0.08390 0.08395 0.08400 0.08405 0.08410 0.08415 Test MSE Test MSE Val MSE 0.13680 0.13685 0.13690 0.13695 0.13700 0.13705 Val MSE RRR Exchange_L720_F96 Rank-MSE Trade-off 0 25 50 75 100 125 150 175 200 Rank of W 0.1740 0.1741 0.1742 0.1743 0.1744 0.1745 Test MSE Test MSE Val MSE 0.2301 0.2302 0.2303 0.2304 0.2305 0.2306 0.2307 0.2308 V… view at source ↗

**Figure 23.** Figure 23: Rank-MSE Trade-off Curves on Exchange 59 [PITH_FULL_IMAGE:figures/full_fig_p059_23.png] view at source ↗

**Figure 24.** Figure 24: Rank-MSE Trade-off Curves on Electricity [PITH_FULL_IMAGE:figures/full_fig_p060_24.png] view at source ↗

**Figure 25.** Figure 25: Rank-MSE Trade-off Curves on Traffic 60 [PITH_FULL_IMAGE:figures/full_fig_p060_25.png] view at source ↗

read the original abstract

Time series forecasting remains a critical challenge across numerous domains, yet the effectiveness of complex models often varies unpredictably across datasets. Recent studies highlight the surprising competitiveness of simple linear models, suggesting that their robustness and interpretability warrant deeper theoretical investigation. This paper presents a systematic study of linear models for time series forecasting, with a focus on the role of characteristic roots in temporal dynamics. We begin by analyzing the noise-free setting, where we show that characteristic roots govern long-term behavior and explain how design choices such as instance normalization and channel independence affect model capabilities. We then extend our analysis to the noisy regime, revealing that models tend to produce spurious roots. This leads to the identification of a key data-scaling property: mitigating the influence of noise requires disproportionately large training data, highlighting the need for structural regularization. To address these challenges, we propose two complementary strategies for robust root restructuring. The first uses rank reduction techniques, including \textbf{Reduced-Rank Regression (RRR)} and \textbf{Direct Weight Rank Reduction (DWRR)}, to recover the low-dimensional latent dynamics. The second, a novel adaptive method called \textbf{Root Purge}, encourages the model to learn a noise-suppressing null space during training. Extensive experiments on standard benchmarks demonstrate the effectiveness of both approaches, validating our theoretical insights and achieving state-of-the-art results in several settings. Our findings underscore the potential of integrating classical theories for linear systems with modern learning techniques to build robust, interpretable, and data-efficient forecasting models. The code is publicly available at: https://github.com/Wangzzzzzzzz/RootPurge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames linear time series models through characteristic roots of the weights and adds Root Purge regularization, but the noisy-regime story assumes rank reduction preserves short-horizon value without clear proof.

read the letter

Colleague, the core takeaway is that this work analyzes linear forecasters by looking at the characteristic roots of their weight matrices, uses that to explain normalization effects, and proposes Root Purge plus rank reduction to handle noise-induced spurious roots. It is a direct application of linear systems ideas to current forecasting setups. What is actually new is the explicit root analysis across noise-free and noisy regimes plus the adaptive Root Purge method that trains a noise-suppressing null space. The rank reduction pieces (RRR and DWRR) are more familiar but get a targeted use here for recovering low-dimensional dynamics. The paper does well at tying design choices to root behavior and at showing empirical gains on standard benchmarks, with some SOTA results and public code that lets others check the claims. Experiments appear to back the practical side. The soft spots sit mainly in the noisy-regime extension. The argument that spurious roots dominate and can be removed without hurting finite-horizon forecasts rests on the idea that extra dimensions are mostly noise and that true latent structure is strictly low-rank. If those extra components carry transient or noise-robust signals useful for actual prediction loss, rank reduction could degrade performance rather than help, exactly as the stress-test note suggests. The abstract mentions a data-scaling property but gives no derivations or error analysis, so it is hard to judge how tightly the claims are supported. This is aimed at researchers who work on interpretable or linear time series models and want classical tools to make them more robust. A reader who cares about why simple models sometimes win and how to regularize them would find usable ideas. I would send it for peer review because the framing is fresh enough and the experiments give something concrete to evaluate, even if the theory needs tightening on the noisy case.

Referee Report

3 major / 2 minor

Summary. The paper claims that characteristic roots of linear forecasting model weight matrices govern long-term temporal dynamics in the noise-free regime (with effects from instance normalization and channel independence), while in the noisy regime models produce spurious roots whose mitigation requires disproportionately large training data; it proposes rank-reduction methods (RRR and DWRR) plus a novel adaptive Root Purge regularizer to recover low-dimensional latent dynamics, with experiments showing improved benchmark performance and SOTA results in some settings.

Significance. If the noise-free/noisy-regime distinction and the data-scaling property hold, the work usefully integrates classical linear-systems theory with modern forecasting practice, offering interpretable structural regularization that could improve data efficiency and robustness. Public code release aids reproducibility and allows direct verification of the empirical claims.

major comments (3)

[§4] §4 (noisy-regime analysis): the data-scaling claim—that mitigating spurious roots requires disproportionately large training data—is presented without visible derivation, error bounds, or quantification details (e.g., how scaling exponents were estimated or whether post-hoc dataset selection was controlled), leaving the central motivation for structural regularization under-supported.
[§5] §5 (rank-reduction methods): the premise that RRR/DWRR recover the true low-dimensional latent dynamics without discarding essential finite-horizon predictive information is load-bearing yet only weakly justified; the noise-free analysis focuses on long-term root behavior, but the manuscript does not demonstrate that transient or noise-robust components removed by rank reduction contribute negligibly to the actual forecasting loss.
[§6] §6 (Root Purge): the adaptive null-space mechanism is introduced without a proof sketch or ablation showing it avoids the data-scaling issue identified in the noisy regime; it is unclear whether the learned null space preserves short-horizon accuracy or merely suppresses long-term spurious roots.

minor comments (2)

[Abstract] Abstract: the statement of 'state-of-the-art results in several settings' should name the specific datasets and baselines for immediate context.
[Notation] Notation: the precise definition of characteristic roots for the learned weight matrices (especially under channel independence) should be stated explicitly before the noise-free analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments on our manuscript. We have carefully considered each point and provide point-by-point responses below. We plan to incorporate several clarifications and additional analyses in the revised version to address the concerns raised.

read point-by-point responses

Referee: [§4] §4 (noisy-regime analysis): the data-scaling claim—that mitigating spurious roots requires disproportionately large training data—is presented without visible derivation, error bounds, or quantification details (e.g., how scaling exponents were estimated or whether post-hoc dataset selection was controlled), leaving the central motivation for structural regularization under-supported.

Authors: We thank the referee for highlighting this issue. The data-scaling claim is supported by empirical evidence from our experiments on synthetic and real datasets, where we observed the need for significantly larger training sets to reduce the influence of spurious roots. However, we agree that a more rigorous derivation or quantification would strengthen the motivation. In the revision, we will include a detailed description of the experimental setup for estimating scaling behavior, including how exponents were computed and controls for dataset selection. We will also discuss the theoretical intuition behind the scaling property based on the analysis of the noisy regime. revision: yes
Referee: [§5] §5 (rank-reduction methods): the premise that RRR/DWRR recover the true low-dimensional latent dynamics without discarding essential finite-horizon predictive information is load-bearing yet only weakly justified; the noise-free analysis focuses on long-term root behavior, but the manuscript does not demonstrate that transient or noise-robust components removed by rank reduction contribute negligibly to the actual forecasting loss.

Authors: We acknowledge that the justification for preserving predictive information under rank reduction could be more explicit. Our analysis in the noise-free setting demonstrates that the characteristic roots capture the essential long-term dynamics, and rank reduction is designed to focus on the dominant low-rank structure. To better address finite-horizon concerns, we will add experiments in the revision that quantify the contribution of the discarded components to the forecasting loss on short horizons, showing that they are indeed negligible in the contexts we consider. This will include comparisons of loss with and without rank reduction on transient behaviors. revision: yes
Referee: [§6] §6 (Root Purge): the adaptive null-space mechanism is introduced without a proof sketch or ablation showing it avoids the data-scaling issue identified in the noisy regime; it is unclear whether the learned null space preserves short-horizon accuracy or merely suppresses long-term spurious roots.

Authors: We appreciate the referee's point on the need for stronger theoretical and empirical support for Root Purge. The method is motivated by encouraging the model to learn a null space that suppresses noise-induced roots while maintaining the core dynamics. In the revised manuscript, we will provide a proof sketch outlining why the adaptive null-space mechanism mitigates the data-scaling requirement, along with additional ablations that evaluate short-horizon accuracy and the balance between suppressing spurious roots and preserving predictive performance. These will demonstrate that short-horizon accuracy is maintained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; analysis rests on standard linear algebra

full rationale

The paper derives the role of characteristic roots from the standard theory of linear recurrence relations applied to the weight matrices of linear forecasting models. This is independent of the target forecasting performance and does not reduce any prediction to a fitted quantity defined by the result itself. The noisy-regime analysis and regularization proposals (RRR, DWRR, Root Purge) follow from this foundation plus empirical scaling observations, without load-bearing self-citations or ansatzes smuggled from prior author work. The derivation chain remains self-contained against external linear algebra benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard linear algebra for characteristic roots and the modeling assumption that rank reduction preserves essential dynamics; no new entities are postulated.

free parameters (1)

reduced rank parameter
The target rank in RRR and DWRR is a modeling choice that must be selected or tuned for each dataset.

axioms (1)

standard math The weight matrix of a linear time series model admits a characteristic root decomposition that governs its long-term iterative behavior.
Invoked when analyzing noise-free long-term dynamics and design choices such as instance normalization.

pith-pipeline@v0.9.0 · 5835 in / 1281 out tokens · 52777 ms · 2026-05-18T12:45:07.108185+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 3 internal anchors

[1]

Community detection and stochastic block models: recent developments

Emmanuel Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18 0 (177): 0 1--86, 2018

work page 2018
[2]

No free lunch theorem: A review

Stavros P Adam, Stamatios-Aggelos N Alexandropoulos, Panos M Pardalos, and Michael N Vrahatis. No free lunch theorem: A review. Approximation and optimization: Algorithms, complexity and applications, pages 57--82, 2019

work page 2019
[3]

Model agnostic time series analysis via matrix estimation

Anish Agarwal, Muhammad Jehangir Amjad, Devavrat Shah, and Dennis Shen. Model agnostic time series analysis via matrix estimation. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2 0 (3): 0 1--39, 2018

work page 2018
[4]

Linear algebra done right

Sheldon Axler. Linear algebra done right. Springer, 2015

work page 2015
[5]

Position: There are no champions in long-term time series forecasting

Lorenzo Brigato, Rafael Morand, Knut Str mmen, Maria Panagiotou, Markus Schmidt, and Stavroula Mougiakakou. Position: There are no champions in long-term time series forecasting. arXiv preprint arXiv:2502.14045, 2025

work page arXiv 2025
[6]

Modern koopman theory for dynamical systems,

Steven L Brunton, Marko Budi s i \'c , Eurika Kaiser, and J Nathan Kutz. Modern koopman theory for dynamical systems. arXiv preprint arXiv:2102.12086, 2021

work page arXiv 2021
[7]

Applied koopmanism

Marko Budi s i \'c , Ryan Mohr, and Igor Mezi \'c . Applied koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22 0 (4), 2012

work page 2012
[8]

A new analysis technique for time series data

John Parker Burg. A new analysis technique for time series data. Paper presented at NATO Advanced Study Institute on Signal Processing, Enschede, Netherlands, 1968, 1968

work page 1968
[9]

Spectral method and regularized mle are both optimal for top-k ranking

Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. Spectral method and regularized mle are both optimal for top-k ranking. Annals of statistics, 47 0 (4): 0 2204, 2019

work page 2019
[10]

Spectral methods for data science: A statistical perspective

Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends in Machine Learning , 14 0 (5): 0 566--806, 2021

work page 2021
[11]

Long-term forecasting with tide: Time-series dense encoder

Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tide: Time-series dense encoder, 2024. URL https://arxiv.org/abs/2304.08424

work page arXiv 2024
[12]

The rotation of eigenvectors by a perturbation

Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7 0 (1): 0 1--46, 1970

work page 1970
[13]

The fitting of time-series models

James Durbin. The fitting of time-series models. Revue de l'Institut International de Statistique, pages 233--244, 1960

work page 1960
[14]

Tslanet: Rethinking transformers for time series representation learning, 2024

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, and Xiaoli Li. Tslanet: Rethinking transformers for time series representation learning, 2024. URL https://arxiv.org/abs/2404.08472

work page arXiv 2024
[15]

On inverses of vandermonde and confluent vandermonde matrices

Walter Gautschi. On inverses of vandermonde and confluent vandermonde matrices. Numer. Math., 4 0 (1): 0 117–123, December 1962. ISSN 0029-599X. doi:10.1007/BF01386302. URL https://doi.org/10.1007/BF01386302

work page doi:10.1007/bf01386302 1962
[16]

Optimal shrinkage of singular values

Matan Gavish and David L Donoho. Optimal shrinkage of singular values. IEEE Transactions on Information Theory, 63 0 (4): 0 2137--2152, 2017

work page 2017
[17]

Time series analysis

James D Hamilton. Time series analysis. Princeton university press, 2020

work page 2020
[18]

Singular spectrum analysis: methodology and comparison

Hossein Hassani. Singular spectrum analysis: methodology and comparison. 2007

work page 2007
[19]

Low rank regularization: A review

Zhanxuan Hu, Feiping Nie, Rong Wang, and Xuelong Li. Low rank regularization: A review. Neural Networks, 136: 0 218--232, 2021

work page 2021
[20]

Timebase: The power of minimalism in efficient long-term time series forecasting

Qihe Huang, Zhengyang Zhou, Kuo Yang, Zhongchao Yi, Xu Wang, and Yang Wang. Timebase: The power of minimalism in efficient long-term time series forecasting. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=GhTdNOMfOD

work page 2025
[21]

Reduced-rank regression for the multivariate linear model

Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5 0 (2): 0 248--264, 1975. ISSN 0047-259X. doi:https://doi.org/10.1016/0047-259X(75)90042-1. URL https://www.sciencedirect.com/science/article/pii/0047259X75900421

work page doi:10.1016/0047-259x(75)90042-1 1975
[22]

Spectral algorithms

Ravindran Kannan, Santosh Vempala, et al. Spectral algorithms. Foundations and Trends in Theoretical Computer Science , 4 0 (3--4): 0 157--288, 2009

work page 2009
[23]

Matrix completion from a few entries

Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries. IEEE transactions on information theory, 56 0 (6): 0 2980--2998, 2010

work page 2010
[24]

Literature survey on low rank approximation of matrices

N Kishore Kumar and Jan Schneider. Literature survey on low rank approximation of matrices. Linear and Multilinear Algebra, 65 0 (11): 0 2212--2244, 2017

work page 2017
[25]

\"U ber die analytischen methoden in der wahrscheinlichkeitsrechnung

Andrei Kolmogoroff. \"U ber die analytischen methoden in der wahrscheinlichkeitsrechnung. Mathematische Annalen, 104: 0 415--458, 1931

work page 1931
[26]

Deep learning for time series forecasting: a survey

Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series forecasting: a survey. International Journal of Machine Learning and Cybernetics, pages 1--34, 2025

work page 2025
[27]

Affine rank minimization via asymptotic log-det iteratively reweighted least squares

Sebastian Kr \"a mer. Affine rank minimization via asymptotic log-det iteratively reweighted least squares. Journal of Machine Learning Research, 26 0 (92): 0 1--44, 2025

work page 2025
[28]

H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2 0 (1-2): 0 83--97, 1955. doi:https://doi.org/10.1002/nav.3800020109. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800020109

work page doi:10.1002/nav.3800020109 1955
[29]

Revisiting long-term time series forecasting: An investigation on linear mapping

Zhe Li, Shiyi Qi, Yiduo Li, and Zenglin Xu. Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv preprint arXiv:2305.10721, 2023

work page arXiv 2023
[30]

Time-series forecasting with deep learning: a survey

Bryan Lim and Stefan Zohren. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379 0 (2194): 0 20200209, 2021

work page 2021
[31]

Sparsetsf: Modeling long-term time series forecasting with 1k parameters, 2024

Shengsheng Lin, Weiwei Lin, Wentai Wu, Haojun Chen, and Junjie Yang. Sparsetsf: Modeling long-term time series forecasting with 1k parameters, 2024. URL https://arxiv.org/abs/2405.00946

work page arXiv 2024
[32]

Koopa: Learning non-stationary time series dynamics with koopman predictors

Yong Liu, Chenyu Li, Jianmin Wang, and Mingsheng Long. Koopa: Learning non-stationary time series dynamics with koopman predictors. Advances in neural information processing systems, 36: 0 12271--12290, 2023

work page 2023
[33]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting, 2024. URL https://arxiv.org/abs/2310.06625

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

The m4 competition: 100,000 time series and 61 forecasting methods

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The m4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36 0 (1): 0 54--74, 2020. ISSN 0169-2070. doi:https://doi.org/10.1016/j.ijforecast.2019.04.014. URL https://www.sciencedirect.com/science/article/pii/S0169207019301128. M4 Competition

work page doi:10.1016/j.ijforecast.2019.04.014 2020
[35]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[36]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

work page 2017
[37]

Foundations of time series analysis and prediction theory

Mohsen Pourahmadi. Foundations of time series analysis and prediction theory. John Wiley & Sons, 2001

work page 2001
[38]

Linear recursive feature machines provably recover low-rank matrices

Adityanarayanan Radhakrishnan, Mikhail Belkin, and Dmitriy Drusvyatskiy. Linear recursive feature machines provably recover low-rank matrices. Proceedings of the National Academy of Sciences, 122 0 (13): 0 e2411325122, 2025

work page 2025
[39]

Dynamic mode decomposition of numerical and experimental data

Peter J Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656: 0 5--28, 2010

work page 2010
[40]

Scaling law for time series forecasting

Jingzhe Shi, Qinwei Ma, Huan Ma, and Lei Li. Scaling law for time series forecasting. arXiv preprint arXiv:2405.15124, 2024

work page arXiv 2024
[41]

Time series analysis and its applications: with R examples

Robert H Shumway and David S Stoffer. Time series analysis and its applications: with R examples. Springer, 2006

work page 2006
[42]

Introduction to the galois theory of linear differential equations

Michael F Singer. Introduction to the galois theory of linear differential equations. Algebraic theory of differential equations, 357: 0 1--82, 2009

work page 2009
[43]

Topics in random matrix theory, volume 132

Terence Tao. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012

work page 2012
[44]

The low-rank hypothesis of complex systems

Vincent Thibeault, Antoine Allard, and Patrick Desrosiers. The low-rank hypothesis of complex systems. Nature Physics, 20 0 (2): 0 294--302, 2024

work page 2024
[45]

An analysis of linear time series forecasting models

William Toner and Luke Darlow. An analysis of linear time series forecasting models. arXiv preprint arXiv:2403.14587, 2024

work page arXiv 2024
[46]

An introduction to matrix concentration inequalities

Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning , 8 0 (1-2): 0 1--230, 2015

work page 2015
[47]

The theory of linear prediction

Palghat P Vaidyanathan. The theory of linear prediction. Morgan & Claypool Publishers, 2007

work page 2007
[48]

Galois theory of linear differential equations, volume 328

Marius Van der Put and Michael F Singer. Galois theory of linear differential equations, volume 328. Springer Science & Business Media, 2012

work page 2012
[49]

High-dimensional statistics: A non-asymptotic viewpoint, volume 48

Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

work page 2019
[50]

Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung)

Hermann Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen, 71 0 (4): 0 441--479, 1912

work page 1912
[51]

The interpolation, extrapolation and smoothing of stationary time series

N Wiener. The interpolation, extrapolation and smoothing of stationary time series. ndrc report, 1942

work page 1942
[52]

A data--driven approximation of the koopman operator: Extending dynamic mode decomposition

Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data--driven approximation of the koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25 0 (6): 0 1307--1346, 2015

work page 2015
[53]

Learning deep time-index models for time series forecasting

Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. Learning deep time-index models for time series forecasting. In International Conference on Machine Learning, pages 37217--37237. PMLR, 2023

work page 2023
[54]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022. URL https://arxiv.org/abs/2106.13008

work page arXiv 2022
[55]

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023. URL https://arxiv.org/abs/2210.02186

work page internal anchor Pith review Pith/arXiv arXiv 2023
[56]

Fits: Modeling time series with 10k parameters, 2024

Zhijian Xu, Ailing Zeng, and Qiang Xu. Fits: Modeling time series with 10 k parameters. arXiv preprint arXiv:2307.03756, 2023

work page arXiv 2023
[57]

Frequency-domain mlps are more effective learners in time series forecasting, 2023

Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Defu Lian, Ning An, Longbing Cao, and Zhendong Niu. Frequency-domain mlps are more effective learners in time series forecasting, 2023. URL https://arxiv.org/abs/2311.06184

work page arXiv 2023
[58]

Filternet: Harnessing frequency filters for time series forecasting, 2024

Kun Yi, Jingru Fei, Qi Zhang, Hui He, Shufeng Hao, Defu Lian, and Wei Fan. Filternet: Harnessing frequency filters for time series forecasting, 2024. URL https://arxiv.org/abs/2411.01623

work page arXiv 2024
[59]

Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121--11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121--11128, 2023

work page 2023
[60]

Skolr: Structured koopman operator linear rnn for time-series forecasting

Yitian Zhang, Liheng Ma, Antonios Valkanas, Boris N Oreshkin, and Mark Coates. Skolr: Structured koopman operator linear rnn for time-series forecasting. arXiv preprint arXiv:2506.14113, 2025

work page arXiv 2025
[61]

Singular spectrum analysis for time series: Introduction to this special issue

Anatoly Alexandrovich Zhigljavsky. Singular spectrum analysis for time series: Introduction to this special issue. Statistics and its Interface, 3 0 (3): 0 255--258, 2010

work page 2010
[62]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference , volume 35, pages 11106--11115. AAAI Press, 2021

work page 2021
[63]

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, pages 27268--27286. PMLR, 2022

work page 2022
[64]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[65]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[66]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page