arxiv: 2507.15774 · v3 · submitted 2025-07-21 · 💻 cs.LG · cs.AI

Time Series Forecasting Through the Lens of Dynamics

Alexis-Raja Brachet , Pierre-Yves Richard , C\'eline Hudelot This is my paper

Pith reviewed 2026-05-19 03:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords time series forecastingdynamics learningmodel architecturePRO-DYN nomenclaturetransformer vs linear modelsplug-and-play designpast to future mapping

0 comments p. Extension

The pith

Time series models perform better when they learn dynamics fully and place that block at the model end.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that effective time series forecasting requires models to learn a direct link from past observations to future values, called learning dynamics capability. To test this, the authors create a PRO-DYN nomenclature that classifies architectures by how completely they capture dynamics and where they locate the dynamics block. Analysis of several models shows that weaker ones learn dynamics only partially and that placing the dynamics block anywhere but at the very end hurts results. Experiments across diverse backbones confirm the pattern, and the work ends with a simple plug-and-play recipe for strengthening existing designs.

Core claim

Under-performing architectures learn dynamics at most partially, and the location of the dynamics block at the model end is of prime importance. The PRO-DYN nomenclature isolates this capability and shows that models succeed when they form a direct past-to-future mapping placed at the architecture's final stage.

What carries the argument

The PRO-DYN nomenclature, which classifies models according to the completeness of their dynamics learning and the position of the dynamics block within the overall architecture.

If this is right

Models that fully learn dynamics and locate the block at the end outperform architectures that do not.
Placing the dynamics block earlier in the network reduces the ability to form accurate past-to-future mappings.
A plug-and-play adjustment that enforces end placement and full dynamics coverage improves a range of existing backbones.
The same lens explains why shallow linear models can beat deeper transformers on forecasting tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could add an explicit dynamics module at the end of transformer stacks to close the gap with linear baselines.
The same end-placement rule might improve performance in related sequence tasks such as video prediction or multivariate forecasting.
Future work could test whether the PRO-DYN classification predicts accuracy on entirely new time-series domains without retraining.
Architectures might be ranked by how directly they implement the past-to-future link rather than by parameter count or depth.

Load-bearing premise

That the PRO-DYN nomenclature correctly isolates and measures the learning dynamics capability as the primary driver of performance differences across models.

What would settle it

Take an under-performing model such as a transformer, move its dynamics block to the final position while keeping other components fixed, and check whether forecasting error drops substantially on standard benchmarks.

Figures

Figures reproduced from arXiv: 2507.15774 by Alexis-Raja Brachet, C\'eline Hudelot, Pierre-Yves Richard.

**Figure 1.** Figure 1: PRO and DYN functions illustrated in the processing chain of a TSF model Mθ. PRO functions are framed and blue while DYN function is encircled and orange. Solid lines represent the main data flow. f post θpost can be fed by X or/and f pre θpre (X) (dotted lines). Dotted line from X to f dyn θdyn and time interval start/overlap case are not drawn for better clarity. From the PRO-DYN nomenclature, we first a… view at source ↗

**Figure 2.** Figure 2: RQ1 models with now full learnable dynamics capabilities. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Global performance distribution of the modified models. A name is underlined (resp. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison between DYN added models and their vanilla version against NLinear performances. Each point is (x; y) : (Rel_perf(Vanilla|NLinear); Rel_perf(DYN|NLinear)), with Rel_perf(Model|NLinear) = score(NLinear)−score(Model) score(NLinear) where score is MSE or MAE. The higher Rel_perf indicator is, the better. Each model Rel_perf mean is shown on its axis. The average gain is mean(y − x|y > x), while the… view at source ↗

**Figure 5.** Figure 5: DYN model performance distribution against their PRO version with setup conditioning. As in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Post-processing model performance distribution against their vanilla version with setup [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

While deep learning is facing an homogenization across modalities led by Transformers, they are still challenged by shallow linear models in the time series forecasting task. Our hypothesis is that models should learn a direct link from past to future data points, which we identify as a learning dynamics capability. We develop an original $\texttt{PRO-DYN}$ nomenclature to analyze existing models through the lens of dynamics. Two observations thus emerge: $\textbf{1.}$ under-performing architectures learn dynamics at most partially, $\textbf{2.}$ the location of the dynamics block at the model end is of prime importance. Our systemic and empirical studies both confirm our observations on a set of performance-varying models with diverse backbones. We propose a simple plug-and-play methodology guiding model designs and improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a PRO-DYN framing to link time series performance to dynamics learning and block placement, but the evidence does not yet isolate those factors from capacity and processing differences.

read the letter

The main takeaway is that this work reframes time series forecasting around whether models learn a direct past-to-future mapping, which they call dynamics capability. They introduce PRO-DYN as a way to label existing architectures and report two patterns: weaker models capture dynamics only partially, and placing the dynamics component at the end matters most. They back this with systemic and empirical checks across varied backbones and offer a plug-and-play design suggestion.

Referee Report

2 major / 2 minor

Summary. The paper hypothesizes that deep learning models for time series forecasting succeed when they learn a direct link from past to future points, termed learning dynamics capability. It introduces the PRO-DYN nomenclature to classify models by how they implement this capability. Systemic and empirical analyses of performance-varying models with diverse backbones yield two observations: under-performing architectures learn dynamics only partially, and placing the dynamics block at the model end is critical. A plug-and-play design methodology is proposed based on these findings.

Significance. If the isolation of dynamics learning holds, the work supplies a useful organizing framework for time series architectures, potentially clarifying why linear baselines remain competitive with Transformers and offering concrete guidance for block placement and capacity allocation.

major comments (2)

Abstract and § on PRO-DYN definition: the claim that under-performing models 'learn dynamics at most partially' is load-bearing for observation 1, yet the manuscript must show how partial learning is quantified independently of final performance; otherwise the nomenclature risks circularity by using performance to label dynamics categories.
Empirical studies section (comparisons across backbones): the attribution of performance gaps to dynamics-block location requires ablations that hold non-dynamics components (capacity, receptive-field construction, early temporal layers) fixed while moving only the identified dynamics block; without such controls the second observation may capture correlated architectural differences rather than a causal effect of block position.

minor comments (2)

Notation: define the precise boundaries of the 'dynamics block' when applying PRO-DYN to each backbone so that the nomenclature can be reproduced on new models.
Figures: label the dynamics block location explicitly in all architecture diagrams to make the 'end-of-model' claim visually verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We respond to each major comment below, clarifying our approach and indicating where revisions will be made to address the concerns.

read point-by-point responses

Referee: Abstract and § on PRO-DYN definition: the claim that under-performing models 'learn dynamics at most partially' is load-bearing for observation 1, yet the manuscript must show how partial learning is quantified independently of final performance; otherwise the nomenclature risks circularity by using performance to label dynamics categories.

Authors: The PRO-DYN nomenclature classifies models according to their architectural mechanisms for realizing dynamics learning (e.g., presence, structure, and connectivity of blocks that implement direct past-to-future mappings), without reference to empirical performance. Partial versus full dynamics learning is therefore determined by whether the architecture includes complete, dedicated dynamics components or only partial approximations thereof, as formalized in the definition section. Performance differences are reported afterward as an empirical observation, not as the basis for the classification. To eliminate any residual ambiguity, we will add explicit, performance-independent probes (such as representation-level diagnostics of dynamics fidelity) in the revised manuscript. revision: partial
Referee: Empirical studies section (comparisons across backbones): the attribution of performance gaps to dynamics-block location requires ablations that hold non-dynamics components (capacity, receptive-field construction, early temporal layers) fixed while moving only the identified dynamics block; without such controls the second observation may capture correlated architectural differences rather than a causal effect of block position.

Authors: Our current experiments compare models across diverse backbones while attempting to match overall capacity where feasible. We agree, however, that stronger isolation is needed to support a causal claim about block placement. In the revision we will introduce controlled ablations that keep capacity, receptive-field construction, and early temporal layers fixed and vary only the position of the identified dynamics block, thereby directly testing the effect of location. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical observations rest on independent architectural analysis

full rationale

The paper states a hypothesis that forecasting models should learn a direct past-to-future link (termed learning dynamics capability), introduces the PRO-DYN nomenclature as an original analytical lens for inspecting model architectures, and reports two observations drawn from applying that lens to a collection of performance-varying models with diverse backbones. The observations are presented as outcomes of the systemic and empirical studies rather than inputs used to define the nomenclature or the performance labels. No equations or definitions are shown that reduce the claimed results to the inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation chain therefore remains self-contained against external benchmarks of model architecture and forecasting performance.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central hypothesis implicitly assumes that 'dynamics' can be isolated as an independent modeling property.

pith-pipeline@v0.9.0 · 5657 in / 1002 out tokens · 30572 ms · 2026-05-19T03:27:54.465686+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

under-performing architectures learn dynamics at most partially, and the location of the dynamics block at the model end is of prime importance
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

LSTF-Linear models ... XT(tL+H)|[tL+1,tL+H] = Linearθ ∘ fpre(XT(tL)) = XT_pre(tL)Wθ + bθ
IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

Mθ(X) = fpost(X, fpre(X), fdyn(X, fpre(X))) ... fdyn defines Mθ dynamics performing a prediction going from TX to TY

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 10 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26 0 (11): 0 832–843, November 1983. ISSN 0001-0782. doi:10.1145/182.358434. URL https://doi.org/10.1145/182.358434

work page doi:10.1145/182.358434 1983
[3]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, 2018. URL https://arxiv.org/abs/1803.01271

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani et al. On the opportunities and risks of foundation models, 2022. URL https://arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting

Peng Chen, Yingying ZHANG, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, and Chenjuan Guo. Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=lJkOCMP2aW

work page 2024
[6]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations, 2019. URL https://arxiv.org/abs/1806.07366

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

BEAT s: Audio pre-training with acoustic tokenizers

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Wanxiang Che, Xiangzhan Yu, and Furu Wei. BEAT s: Audio pre-training with acoustic tokenizers. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, vol...

work page 2023
[8]

Long Short-Term Memory-Networks for Machine Reading

Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term memory-networks for machine reading, 2016. URL https://arxiv.org/abs/1601.06733

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

Direct multi-step estimation and forecasting

Guillaume Chevillon. Direct multi-step estimation and forecasting . Documents de Travail de l'OFCE 2005-10, Observatoire Francais des Conjonctures Economiques (OFCE), 2005. URL https://ideas.repec.org/p/fce/doctra/0510.html

work page 2005
[10]

Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting

Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, and Shirui Pan. Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting. In Lud De Raedt (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 , pp.\ 1994--2001. International Joint...

work page doi:10.24963/ijcai.2022/277 1994
[11]

Periodicity decoupling framework for long-term series forecasting

Tao Dai, Beiliang Wu, Peiyuan Liu, Naiqi Li, Jigang Bao, Yong Jiang, and Shu-Tao Xia. Periodicity decoupling framework for long-term series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=dp27P5HBBt

work page 2024
[12]

Long-term forecasting with tide: Time-series dense encoder

Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tide: Time-series dense encoder. CoRR, abs/2304.08424, 2023. doi:10.48550/ARXIV.2304.08424. URL https://doi.org/10.48550/arXiv.2304.08424

work page doi:10.48550/arxiv.2304.08424 2023
[13]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URL https://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2021
[14]

Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14 0 (2): 0 179‑211, 1990. doi:10.1207/s15516709cog1402_1. URL https://doi.org/10.1207/s15516709cog1402_1

work page doi:10.1207/s15516709cog1402_1 1990
[15]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024. URL https://arxiv.org/abs/2312.00752

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

, author Schmidhuber, J

Sepp Hochreiter and J\" u rgen Schmidhuber. Long short-term memory. Neural Comput., 9 0 (8): 0 1735–1780, November 1997. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[17]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

Attractor memory for long-term time series forecasting: A chaos perspective, 2024

Jiaxi Hu, Yuehong Hu, Wei Chen, Ming Jin, Shirui Pan, Qingsong Wen, and Yuxuan Liang. Attractor memory for long-term time series forecasting: A chaos perspective, 2024. URL https://arxiv.org/abs/2402.11463

work page arXiv 2024
[19]

Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond, 2025

Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, and Chiwun Yang. Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond, 2025. URL https://arxiv.org/abs/2412.06061

work page arXiv 2025
[20]

Kovachki, Z

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces, 2024. URL https://arxiv.org/abs/2108.08481

work page arXiv 2024
[21]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran...

work page 2019
[22]

Fourier Neural Operator for Parametric Partial Differential Equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021. URL https://arxiv.org/abs/2010.08895

work page internal anchor Pith review Pith/arXiv arXiv 2021
[23]

Echo-gl: Earnings calls-driven heterogeneous graph learning for stock movement prediction

Mengpu Liu, Mengying Zhu, Xiuyuan Wang, Guofang Ma, Jianwei Yin, and Xiaolin Zheng. Echo-gl: Earnings calls-driven heterogeneous graph learning for stock movement prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 38 0 (12): 0 13972--13980, Mar. 2024 a . doi:10.1609/aaai.v38i12.29305. URL https://ojs.aaai.org/index.php/AAAI/article...

work page doi:10.1609/aaai.v38i12.29305 2024
[24]

Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023

Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023. URL https://arxiv.org/abs/2205.14415

work page arXiv 2023
[25]

itransformer: Inverted transformers are effective for time series forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=JePfAI8fah

work page 2024
[26]

A time series is worth 64 words: Long-term forecasting with transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Jbdc0vTOcol

work page 2023
[27]

Pdetime: Rethinking long-term multivariate time series forecasting from the perspective of partial differential equations, 2024

Shiyi Qi, Zenglin Xu, Yiduo Li, Liangjian Wen, Qingsong Wen, Qifan Wang, and Yuan Qi. Pdetime: Rethinking long-term multivariate time series forecasting from the perspective of partial differential equations, 2024. URL https://arxiv.org/abs/2402.16913

work page arXiv 2024
[28]

Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, and Bin Yang. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. Proc. VLDB Endow. , 17: 0 2363 -- 2377, 2024

work page 2024
[29]

Duet: Dual clustering enhanced multivariate time series forecasting, 2025

Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu, and Bin Yang. Duet: Dual clustering enhanced multivariate time series forecasting, 2025. URL https://arxiv.org/abs/2412.10859

work page arXiv 2025
[30]

Raissi, P

M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378: 0 686--707, 2019. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/sc...

work page doi:10.1016/j.jcp.2018.10.045 2019
[31]

Eye movements in reading and information processing : 20 years of research

Keith Rayner. Eye movements in reading and information processing : 20 years of research. Psychological Bulletin, 124 0 (3): 0 372‑422, 1998. doi:10.1037/0033-2909.124.3.372. URL https://pubmed.ncbi.nlm.nih.gov/9849112/

work page doi:10.1037/0033-2909.124.3.372 1998
[32]

Llm-sr: Scientific equation discovery via programming with large language models, 2024

Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models, 2024. URL https://arxiv.org/abs/2404.18400

work page arXiv 2024
[33]

Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen

Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen. Are language models actually useful for time series forecasting? In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 60162--60191. Curran Associates, Inc., 2024. URL h...

work page 2024
[34]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https:...

work page 2017
[35]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks, 2018. URL https://arxiv.org/abs/1710.10903

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

MICN : Multi-scale local and global context modeling for long-term series forecasting

Huiqiang Wang, Jian Peng, Feihu Huang, Jince Wang, Junhui Chen, and Yifei Xiao. MICN : Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zt53IDUR1U

work page 2023
[37]

Zhang, and Jun Zhou

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and Jun Zhou. Timemixer: Decomposable multiscale mixing for time series forecasting, 2024. URL https://arxiv.org/abs/2405.14616

work page arXiv 2024
[38]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.\ 22419--22430. Curran Associates, Inc., 2021. URL https...

work page 2021
[39]

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023. URL https://arxiv.org/abs/2210.02186

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Fits: Modeling time series with 10k parameters, 2024

Zhijian Xu, Ailing Zeng, and Qiang Xu. Fits: Modeling time series with 10k parameters, 2024. URL https://arxiv.org/abs/2307.03756

work page arXiv 2024
[41]

Are transformers effective for time series forecasting?, 2022

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting?, 2022. URL https://arxiv.org/abs/2205.13504

work page arXiv 2022
[42]

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=vSVLM2j9eie

work page 2023
[43]

Informer : Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer : Beyond efficient transformer for long sequence time-series forecasting. Proceedings Of The AAAI Conference On Artificial Intelligence, 35 0 (12): 0 11106‑11115, 2021. doi:10.1609/aaai.v35i12.17325. URL https://doi.org/10.1609/aaai.v35i12.17325

work page doi:10.1609/aaai.v35i12.17325 2021
[44]

FED former: Frequency enhanced decomposed transformer for long-term series forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FED former: Frequency enhanced decomposed transformer for long-term series forecasting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings...

work page 2022
[45]

Fi LM : Frequency improved legendre memory model for long-term time series forecasting

Tian Zhou, Ziqing Ma, xue wang, Qingsong Wen, Liang Sun, Tao Yao, Wotao Yin, and Rong Jin. Fi LM : Frequency improved legendre memory model for long-term time series forecasting. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022 b . URL https://openreview.net/forum?id=zTQdHSQUQWc

work page 2022