pith. the verified trust layer for science. sign in

arxiv: 2507.15774 · v3 · submitted 2025-07-21 · 💻 cs.LG · cs.AI

Time Series Forecasting Through the Lens of Dynamics

Pith reviewed 2026-05-19 03:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords time series forecastingdynamics learningmodel architecturePRO-DYN nomenclaturetransformer vs linear modelsplug-and-play designpast to future mapping
0
0 comments X p. Extension

The pith

Time series models perform better when they learn dynamics fully and place that block at the model end.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that effective time series forecasting requires models to learn a direct link from past observations to future values, called learning dynamics capability. To test this, the authors create a PRO-DYN nomenclature that classifies architectures by how completely they capture dynamics and where they locate the dynamics block. Analysis of several models shows that weaker ones learn dynamics only partially and that placing the dynamics block anywhere but at the very end hurts results. Experiments across diverse backbones confirm the pattern, and the work ends with a simple plug-and-play recipe for strengthening existing designs.

Core claim

Under-performing architectures learn dynamics at most partially, and the location of the dynamics block at the model end is of prime importance. The PRO-DYN nomenclature isolates this capability and shows that models succeed when they form a direct past-to-future mapping placed at the architecture's final stage.

What carries the argument

The PRO-DYN nomenclature, which classifies models according to the completeness of their dynamics learning and the position of the dynamics block within the overall architecture.

If this is right

  • Models that fully learn dynamics and locate the block at the end outperform architectures that do not.
  • Placing the dynamics block earlier in the network reduces the ability to form accurate past-to-future mappings.
  • A plug-and-play adjustment that enforces end placement and full dynamics coverage improves a range of existing backbones.
  • The same lens explains why shallow linear models can beat deeper transformers on forecasting tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers could add an explicit dynamics module at the end of transformer stacks to close the gap with linear baselines.
  • The same end-placement rule might improve performance in related sequence tasks such as video prediction or multivariate forecasting.
  • Future work could test whether the PRO-DYN classification predicts accuracy on entirely new time-series domains without retraining.
  • Architectures might be ranked by how directly they implement the past-to-future link rather than by parameter count or depth.

Load-bearing premise

That the PRO-DYN nomenclature correctly isolates and measures the learning dynamics capability as the primary driver of performance differences across models.

What would settle it

Take an under-performing model such as a transformer, move its dynamics block to the final position while keeping other components fixed, and check whether forecasting error drops substantially on standard benchmarks.

Figures

Figures reproduced from arXiv: 2507.15774 by Alexis-Raja Brachet, C\'eline Hudelot, Pierre-Yves Richard.

Figure 1
Figure 1. Figure 1: PRO and DYN functions illustrated in the processing chain of a TSF model Mθ. PRO functions are framed and blue while DYN function is encircled and orange. Solid lines represent the main data flow. f post θpost can be fed by X or/and f pre θpre (X) (dotted lines). Dotted line from X to f dyn θdyn and time interval start/overlap case are not drawn for better clarity. From the PRO-DYN nomenclature, we first a… view at source ↗
Figure 2
Figure 2. Figure 2: RQ1 models with now full learnable dynamics capabilities. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Global performance distribution of the modified models. A name is underlined (resp. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison between DYN added models and their vanilla version against NLinear performances. Each point is (x; y) : (Rel_perf(Vanilla|NLinear); Rel_perf(DYN|NLinear)), with Rel_perf(Model|NLinear) = score(NLinear)−score(Model) score(NLinear) where score is MSE or MAE. The higher Rel_perf indicator is, the better. Each model Rel_perf mean is shown on its axis. The average gain is mean(y − x|y > x), while the… view at source ↗
Figure 5
Figure 5. Figure 5: DYN model performance distribution against their PRO version with setup conditioning. As in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Post-processing model performance distribution against their vanilla version with setup [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

While deep learning is facing an homogenization across modalities led by Transformers, they are still challenged by shallow linear models in the time series forecasting task. Our hypothesis is that models should learn a direct link from past to future data points, which we identify as a learning dynamics capability. We develop an original $\texttt{PRO-DYN}$ nomenclature to analyze existing models through the lens of dynamics. Two observations thus emerge: $\textbf{1.}$ under-performing architectures learn dynamics at most partially, $\textbf{2.}$ the location of the dynamics block at the model end is of prime importance. Our systemic and empirical studies both confirm our observations on a set of performance-varying models with diverse backbones. We propose a simple plug-and-play methodology guiding model designs and improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper hypothesizes that deep learning models for time series forecasting succeed when they learn a direct link from past to future points, termed learning dynamics capability. It introduces the PRO-DYN nomenclature to classify models by how they implement this capability. Systemic and empirical analyses of performance-varying models with diverse backbones yield two observations: under-performing architectures learn dynamics only partially, and placing the dynamics block at the model end is critical. A plug-and-play design methodology is proposed based on these findings.

Significance. If the isolation of dynamics learning holds, the work supplies a useful organizing framework for time series architectures, potentially clarifying why linear baselines remain competitive with Transformers and offering concrete guidance for block placement and capacity allocation.

major comments (2)
  1. Abstract and § on PRO-DYN definition: the claim that under-performing models 'learn dynamics at most partially' is load-bearing for observation 1, yet the manuscript must show how partial learning is quantified independently of final performance; otherwise the nomenclature risks circularity by using performance to label dynamics categories.
  2. Empirical studies section (comparisons across backbones): the attribution of performance gaps to dynamics-block location requires ablations that hold non-dynamics components (capacity, receptive-field construction, early temporal layers) fixed while moving only the identified dynamics block; without such controls the second observation may capture correlated architectural differences rather than a causal effect of block position.
minor comments (2)
  1. Notation: define the precise boundaries of the 'dynamics block' when applying PRO-DYN to each backbone so that the nomenclature can be reproduced on new models.
  2. Figures: label the dynamics block location explicitly in all architecture diagrams to make the 'end-of-model' claim visually verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We respond to each major comment below, clarifying our approach and indicating where revisions will be made to address the concerns.

read point-by-point responses
  1. Referee: Abstract and § on PRO-DYN definition: the claim that under-performing models 'learn dynamics at most partially' is load-bearing for observation 1, yet the manuscript must show how partial learning is quantified independently of final performance; otherwise the nomenclature risks circularity by using performance to label dynamics categories.

    Authors: The PRO-DYN nomenclature classifies models according to their architectural mechanisms for realizing dynamics learning (e.g., presence, structure, and connectivity of blocks that implement direct past-to-future mappings), without reference to empirical performance. Partial versus full dynamics learning is therefore determined by whether the architecture includes complete, dedicated dynamics components or only partial approximations thereof, as formalized in the definition section. Performance differences are reported afterward as an empirical observation, not as the basis for the classification. To eliminate any residual ambiguity, we will add explicit, performance-independent probes (such as representation-level diagnostics of dynamics fidelity) in the revised manuscript. revision: partial

  2. Referee: Empirical studies section (comparisons across backbones): the attribution of performance gaps to dynamics-block location requires ablations that hold non-dynamics components (capacity, receptive-field construction, early temporal layers) fixed while moving only the identified dynamics block; without such controls the second observation may capture correlated architectural differences rather than a causal effect of block position.

    Authors: Our current experiments compare models across diverse backbones while attempting to match overall capacity where feasible. We agree, however, that stronger isolation is needed to support a causal claim about block placement. In the revision we will introduce controlled ablations that keep capacity, receptive-field construction, and early temporal layers fixed and vary only the position of the identified dynamics block, thereby directly testing the effect of location. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical observations rest on independent architectural analysis

full rationale

The paper states a hypothesis that forecasting models should learn a direct past-to-future link (termed learning dynamics capability), introduces the PRO-DYN nomenclature as an original analytical lens for inspecting model architectures, and reports two observations drawn from applying that lens to a collection of performance-varying models with diverse backbones. The observations are presented as outcomes of the systemic and empirical studies rather than inputs used to define the nomenclature or the performance labels. No equations or definitions are shown that reduce the claimed results to the inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation chain therefore remains self-contained against external benchmarks of model architecture and forecasting performance.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central hypothesis implicitly assumes that 'dynamics' can be isolated as an independent modeling property.

pith-pipeline@v0.9.0 · 5657 in / 1002 out tokens · 30572 ms · 2026-05-19T03:27:54.465686+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 10 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26 0 (11): 0 832–843, November 1983. ISSN 0001-0782. doi:10.1145/182.358434. URL https://doi.org/10.1145/182.358434

  3. [3]

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, 2018. URL https://arxiv.org/abs/1803.01271

  4. [4]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani et al. On the opportunities and risks of foundation models, 2022. URL https://arxiv.org/abs/2108.07258

  5. [5]

    Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting

    Peng Chen, Yingying ZHANG, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, and Chenjuan Guo. Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=lJkOCMP2aW

  6. [6]

    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations, 2019. URL https://arxiv.org/abs/1806.07366

  7. [7]

    BEAT s: Audio pre-training with acoustic tokenizers

    Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Wanxiang Che, Xiangzhan Yu, and Furu Wei. BEAT s: Audio pre-training with acoustic tokenizers. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, vol...

  8. [8]

    Long Short-Term Memory-Networks for Machine Reading

    Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term memory-networks for machine reading, 2016. URL https://arxiv.org/abs/1601.06733

  9. [9]

    Direct multi-step estimation and forecasting

    Guillaume Chevillon. Direct multi-step estimation and forecasting . Documents de Travail de l'OFCE 2005-10, Observatoire Francais des Conjonctures Economiques (OFCE), 2005. URL https://ideas.repec.org/p/fce/doctra/0510.html

  10. [10]

    Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting

    Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, and Shirui Pan. Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting. In Lud De Raedt (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 , pp.\ 1994--2001. International Joint...

  11. [11]

    Periodicity decoupling framework for long-term series forecasting

    Tao Dai, Beiliang Wu, Peiyuan Liu, Naiqi Li, Jigang Bao, Yong Jiang, and Shu-Tao Xia. Periodicity decoupling framework for long-term series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=dp27P5HBBt

  12. [12]

    Long-term forecasting with tide: Time-series dense encoder

    Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tide: Time-series dense encoder. CoRR, abs/2304.08424, 2023. doi:10.48550/ARXIV.2304.08424. URL https://doi.org/10.48550/arXiv.2304.08424

  13. [13]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URL https://arxiv.org/abs/2010.11929

  14. [14]

    Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14 0 (2): 0 179‑211, 1990. doi:10.1207/s15516709cog1402_1. URL https://doi.org/10.1207/s15516709cog1402_1

  15. [15]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024. URL https://arxiv.org/abs/2312.00752

  16. [16]

    , author Schmidhuber, J

    Sepp Hochreiter and J\" u rgen Schmidhuber. Long short-term memory. Neural Comput., 9 0 (8): 0 1735–1780, November 1997. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735

  17. [17]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

  18. [18]

    Attractor memory for long-term time series forecasting: A chaos perspective, 2024

    Jiaxi Hu, Yuehong Hu, Wei Chen, Ming Jin, Shirui Pan, Qingsong Wen, and Yuxuan Liang. Attractor memory for long-term time series forecasting: A chaos perspective, 2024. URL https://arxiv.org/abs/2402.11463

  19. [19]

    Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond, 2025

    Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, and Chiwun Yang. Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond, 2025. URL https://arxiv.org/abs/2412.06061

  20. [20]

    Kovachki, Z

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces, 2024. URL https://arxiv.org/abs/2108.08481

  21. [21]

    Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

    Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran...

  22. [22]

    Fourier Neural Operator for Parametric Partial Differential Equations

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021. URL https://arxiv.org/abs/2010.08895

  23. [23]

    Echo-gl: Earnings calls-driven heterogeneous graph learning for stock movement prediction

    Mengpu Liu, Mengying Zhu, Xiuyuan Wang, Guofang Ma, Jianwei Yin, and Xiaolin Zheng. Echo-gl: Earnings calls-driven heterogeneous graph learning for stock movement prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 38 0 (12): 0 13972--13980, Mar. 2024 a . doi:10.1609/aaai.v38i12.29305. URL https://ojs.aaai.org/index.php/AAAI/article...

  24. [24]

    Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023

    Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023. URL https://arxiv.org/abs/2205.14415

  25. [25]

    itransformer: Inverted transformers are effective for time series forecasting

    Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=JePfAI8fah

  26. [26]

    A time series is worth 64 words: Long-term forecasting with transformers

    Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Jbdc0vTOcol

  27. [27]

    Pdetime: Rethinking long-term multivariate time series forecasting from the perspective of partial differential equations, 2024

    Shiyi Qi, Zenglin Xu, Yiduo Li, Liangjian Wen, Qingsong Wen, Qifan Wang, and Yuan Qi. Pdetime: Rethinking long-term multivariate time series forecasting from the perspective of partial differential equations, 2024. URL https://arxiv.org/abs/2402.16913

  28. [28]

    Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods

    Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, and Bin Yang. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. Proc. VLDB Endow. , 17: 0 2363 -- 2377, 2024

  29. [29]

    Duet: Dual clustering enhanced multivariate time series forecasting, 2025

    Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu, and Bin Yang. Duet: Dual clustering enhanced multivariate time series forecasting, 2025. URL https://arxiv.org/abs/2412.10859

  30. [30]

    Raissi, P

    M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378: 0 686--707, 2019. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/sc...

  31. [31]

    Eye movements in reading and information processing : 20 years of research

    Keith Rayner. Eye movements in reading and information processing : 20 years of research. Psychological Bulletin, 124 0 (3): 0 372‑422, 1998. doi:10.1037/0033-2909.124.3.372. URL https://pubmed.ncbi.nlm.nih.gov/9849112/

  32. [32]

    Llm-sr: Scientific equation discovery via programming with large language models, 2024

    Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models, 2024. URL https://arxiv.org/abs/2404.18400

  33. [33]

    Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen

    Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen. Are language models actually useful for time series forecasting? In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 60162--60191. Curran Associates, Inc., 2024. URL h...

  34. [34]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https:...

  35. [35]

    Graph Attention Networks

    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks, 2018. URL https://arxiv.org/abs/1710.10903

  36. [36]

    MICN : Multi-scale local and global context modeling for long-term series forecasting

    Huiqiang Wang, Jian Peng, Feihu Huang, Jince Wang, Junhui Chen, and Yifei Xiao. MICN : Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zt53IDUR1U

  37. [37]

    Zhang, and Jun Zhou

    Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and Jun Zhou. Timemixer: Decomposable multiscale mixing for time series forecasting, 2024. URL https://arxiv.org/abs/2405.14616

  38. [38]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.\ 22419--22430. Curran Associates, Inc., 2021. URL https...

  39. [39]

    TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023. URL https://arxiv.org/abs/2210.02186

  40. [40]

    Fits: Modeling time series with 10k parameters, 2024

    Zhijian Xu, Ailing Zeng, and Qiang Xu. Fits: Modeling time series with 10k parameters, 2024. URL https://arxiv.org/abs/2307.03756

  41. [41]

    Are transformers effective for time series forecasting?, 2022

    Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting?, 2022. URL https://arxiv.org/abs/2205.13504

  42. [42]

    Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

    Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=vSVLM2j9eie

  43. [43]

    Informer : Beyond efficient transformer for long sequence time-series forecasting

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer : Beyond efficient transformer for long sequence time-series forecasting. Proceedings Of The AAAI Conference On Artificial Intelligence, 35 0 (12): 0 11106‑11115, 2021. doi:10.1609/aaai.v35i12.17325. URL https://doi.org/10.1609/aaai.v35i12.17325

  44. [44]

    FED former: Frequency enhanced decomposed transformer for long-term series forecasting

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FED former: Frequency enhanced decomposed transformer for long-term series forecasting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings...

  45. [45]

    Fi LM : Frequency improved legendre memory model for long-term time series forecasting

    Tian Zhou, Ziqing Ma, xue wang, Qingsong Wen, Liang Sun, Tao Yao, Wotao Yin, and Rong Jin. Fi LM : Frequency improved legendre memory model for long-term time series forecasting. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022 b . URL https://openreview.net/forum?id=zTQdHSQUQWc