Time Series Forecasting Through the Lens of Dynamics
Pith reviewed 2026-05-19 03:27 UTC · model grok-4.3
The pith
Time series models perform better when they learn dynamics fully and place that block at the model end.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under-performing architectures learn dynamics at most partially, and the location of the dynamics block at the model end is of prime importance. The PRO-DYN nomenclature isolates this capability and shows that models succeed when they form a direct past-to-future mapping placed at the architecture's final stage.
What carries the argument
The PRO-DYN nomenclature, which classifies models according to the completeness of their dynamics learning and the position of the dynamics block within the overall architecture.
If this is right
- Models that fully learn dynamics and locate the block at the end outperform architectures that do not.
- Placing the dynamics block earlier in the network reduces the ability to form accurate past-to-future mappings.
- A plug-and-play adjustment that enforces end placement and full dynamics coverage improves a range of existing backbones.
- The same lens explains why shallow linear models can beat deeper transformers on forecasting tasks.
Where Pith is reading between the lines
- Designers could add an explicit dynamics module at the end of transformer stacks to close the gap with linear baselines.
- The same end-placement rule might improve performance in related sequence tasks such as video prediction or multivariate forecasting.
- Future work could test whether the PRO-DYN classification predicts accuracy on entirely new time-series domains without retraining.
- Architectures might be ranked by how directly they implement the past-to-future link rather than by parameter count or depth.
Load-bearing premise
That the PRO-DYN nomenclature correctly isolates and measures the learning dynamics capability as the primary driver of performance differences across models.
What would settle it
Take an under-performing model such as a transformer, move its dynamics block to the final position while keeping other components fixed, and check whether forecasting error drops substantially on standard benchmarks.
Figures
read the original abstract
While deep learning is facing an homogenization across modalities led by Transformers, they are still challenged by shallow linear models in the time series forecasting task. Our hypothesis is that models should learn a direct link from past to future data points, which we identify as a learning dynamics capability. We develop an original $\texttt{PRO-DYN}$ nomenclature to analyze existing models through the lens of dynamics. Two observations thus emerge: $\textbf{1.}$ under-performing architectures learn dynamics at most partially, $\textbf{2.}$ the location of the dynamics block at the model end is of prime importance. Our systemic and empirical studies both confirm our observations on a set of performance-varying models with diverse backbones. We propose a simple plug-and-play methodology guiding model designs and improvements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper hypothesizes that deep learning models for time series forecasting succeed when they learn a direct link from past to future points, termed learning dynamics capability. It introduces the PRO-DYN nomenclature to classify models by how they implement this capability. Systemic and empirical analyses of performance-varying models with diverse backbones yield two observations: under-performing architectures learn dynamics only partially, and placing the dynamics block at the model end is critical. A plug-and-play design methodology is proposed based on these findings.
Significance. If the isolation of dynamics learning holds, the work supplies a useful organizing framework for time series architectures, potentially clarifying why linear baselines remain competitive with Transformers and offering concrete guidance for block placement and capacity allocation.
major comments (2)
- Abstract and § on PRO-DYN definition: the claim that under-performing models 'learn dynamics at most partially' is load-bearing for observation 1, yet the manuscript must show how partial learning is quantified independently of final performance; otherwise the nomenclature risks circularity by using performance to label dynamics categories.
- Empirical studies section (comparisons across backbones): the attribution of performance gaps to dynamics-block location requires ablations that hold non-dynamics components (capacity, receptive-field construction, early temporal layers) fixed while moving only the identified dynamics block; without such controls the second observation may capture correlated architectural differences rather than a causal effect of block position.
minor comments (2)
- Notation: define the precise boundaries of the 'dynamics block' when applying PRO-DYN to each backbone so that the nomenclature can be reproduced on new models.
- Figures: label the dynamics block location explicitly in all architecture diagrams to make the 'end-of-model' claim visually verifiable.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We respond to each major comment below, clarifying our approach and indicating where revisions will be made to address the concerns.
read point-by-point responses
-
Referee: Abstract and § on PRO-DYN definition: the claim that under-performing models 'learn dynamics at most partially' is load-bearing for observation 1, yet the manuscript must show how partial learning is quantified independently of final performance; otherwise the nomenclature risks circularity by using performance to label dynamics categories.
Authors: The PRO-DYN nomenclature classifies models according to their architectural mechanisms for realizing dynamics learning (e.g., presence, structure, and connectivity of blocks that implement direct past-to-future mappings), without reference to empirical performance. Partial versus full dynamics learning is therefore determined by whether the architecture includes complete, dedicated dynamics components or only partial approximations thereof, as formalized in the definition section. Performance differences are reported afterward as an empirical observation, not as the basis for the classification. To eliminate any residual ambiguity, we will add explicit, performance-independent probes (such as representation-level diagnostics of dynamics fidelity) in the revised manuscript. revision: partial
-
Referee: Empirical studies section (comparisons across backbones): the attribution of performance gaps to dynamics-block location requires ablations that hold non-dynamics components (capacity, receptive-field construction, early temporal layers) fixed while moving only the identified dynamics block; without such controls the second observation may capture correlated architectural differences rather than a causal effect of block position.
Authors: Our current experiments compare models across diverse backbones while attempting to match overall capacity where feasible. We agree, however, that stronger isolation is needed to support a causal claim about block placement. In the revision we will introduce controlled ablations that keep capacity, receptive-field construction, and early temporal layers fixed and vary only the position of the identified dynamics block, thereby directly testing the effect of location. revision: yes
Circularity Check
No significant circularity; empirical observations rest on independent architectural analysis
full rationale
The paper states a hypothesis that forecasting models should learn a direct past-to-future link (termed learning dynamics capability), introduces the PRO-DYN nomenclature as an original analytical lens for inspecting model architectures, and reports two observations drawn from applying that lens to a collection of performance-varying models with diverse backbones. The observations are presented as outcomes of the systemic and empirical studies rather than inputs used to define the nomenclature or the performance labels. No equations or definitions are shown that reduce the claimed results to the inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation chain therefore remains self-contained against external benchmarks of model architecture and forecasting performance.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
under-performing architectures learn dynamics at most partially, and the location of the dynamics block at the model end is of prime importance
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
LSTF-Linear models ... XT(tL+H)|[tL+1,tL+H] = Linearθ ∘ fpre(XT(tL)) = XT_pre(tL)Wθ + bθ
-
IndisputableMonolith/Foundation/ArrowOfTime.leanarrow_from_z refines?
refinesRelation between the paper passage and the cited Recognition theorem.
Mθ(X) = fpost(X, fpre(X), fdyn(X, fpre(X))) ... fdyn defines Mθ dynamics performing a prediction going from TX to TY
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26 0 (11): 0 832–843, November 1983. ISSN 0001-0782. doi:10.1145/182.358434. URL https://doi.org/10.1145/182.358434
-
[3]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, 2018. URL https://arxiv.org/abs/1803.01271
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani et al. On the opportunities and risks of foundation models, 2022. URL https://arxiv.org/abs/2108.07258
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting
Peng Chen, Yingying ZHANG, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, and Chenjuan Guo. Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=lJkOCMP2aW
work page 2024
-
[6]
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations, 2019. URL https://arxiv.org/abs/1806.07366
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[7]
BEAT s: Audio pre-training with acoustic tokenizers
Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Wanxiang Che, Xiangzhan Yu, and Furu Wei. BEAT s: Audio pre-training with acoustic tokenizers. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, vol...
work page 2023
-
[8]
Long Short-Term Memory-Networks for Machine Reading
Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term memory-networks for machine reading, 2016. URL https://arxiv.org/abs/1601.06733
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[9]
Direct multi-step estimation and forecasting
Guillaume Chevillon. Direct multi-step estimation and forecasting . Documents de Travail de l'OFCE 2005-10, Observatoire Francais des Conjonctures Economiques (OFCE), 2005. URL https://ideas.repec.org/p/fce/doctra/0510.html
work page 2005
-
[10]
Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, and Shirui Pan. Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting. In Lud De Raedt (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 , pp.\ 1994--2001. International Joint...
-
[11]
Periodicity decoupling framework for long-term series forecasting
Tao Dai, Beiliang Wu, Peiyuan Liu, Naiqi Li, Jigang Bao, Yong Jiang, and Shu-Tao Xia. Periodicity decoupling framework for long-term series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=dp27P5HBBt
work page 2024
-
[12]
Long-term forecasting with tide: Time-series dense encoder
Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tide: Time-series dense encoder. CoRR, abs/2304.08424, 2023. doi:10.48550/ARXIV.2304.08424. URL https://doi.org/10.48550/arXiv.2304.08424
-
[13]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URL https://arxiv.org/abs/2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14 0 (2): 0 179‑211, 1990. doi:10.1207/s15516709cog1402_1. URL https://doi.org/10.1207/s15516709cog1402_1
-
[15]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024. URL https://arxiv.org/abs/2312.00752
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Sepp Hochreiter and J\" u rgen Schmidhuber. Long short-term memory. Neural Comput., 9 0 (8): 0 1735–1780, November 1997. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735
-
[17]
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Attractor memory for long-term time series forecasting: A chaos perspective, 2024
Jiaxi Hu, Yuehong Hu, Wei Chen, Ming Jin, Shirui Pan, Qingsong Wen, and Yuxuan Liang. Attractor memory for long-term time series forecasting: A chaos perspective, 2024. URL https://arxiv.org/abs/2402.11463
-
[19]
Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, and Chiwun Yang. Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond, 2025. URL https://arxiv.org/abs/2412.06061
-
[20]
Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces, 2024. URL https://arxiv.org/abs/2108.08481
-
[21]
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting
Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran...
work page 2019
-
[22]
Fourier Neural Operator for Parametric Partial Differential Equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021. URL https://arxiv.org/abs/2010.08895
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[23]
Echo-gl: Earnings calls-driven heterogeneous graph learning for stock movement prediction
Mengpu Liu, Mengying Zhu, Xiuyuan Wang, Guofang Ma, Jianwei Yin, and Xiaolin Zheng. Echo-gl: Earnings calls-driven heterogeneous graph learning for stock movement prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 38 0 (12): 0 13972--13980, Mar. 2024 a . doi:10.1609/aaai.v38i12.29305. URL https://ojs.aaai.org/index.php/AAAI/article...
-
[24]
Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023
Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023. URL https://arxiv.org/abs/2205.14415
-
[25]
itransformer: Inverted transformers are effective for time series forecasting
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=JePfAI8fah
work page 2024
-
[26]
A time series is worth 64 words: Long-term forecasting with transformers
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Jbdc0vTOcol
work page 2023
-
[27]
Shiyi Qi, Zenglin Xu, Yiduo Li, Liangjian Wen, Qingsong Wen, Qifan Wang, and Yuan Qi. Pdetime: Rethinking long-term multivariate time series forecasting from the perspective of partial differential equations, 2024. URL https://arxiv.org/abs/2402.16913
-
[28]
Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods
Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, and Bin Yang. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. Proc. VLDB Endow. , 17: 0 2363 -- 2377, 2024
work page 2024
-
[29]
Duet: Dual clustering enhanced multivariate time series forecasting, 2025
Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu, and Bin Yang. Duet: Dual clustering enhanced multivariate time series forecasting, 2025. URL https://arxiv.org/abs/2412.10859
-
[30]
M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378: 0 686--707, 2019. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/sc...
-
[31]
Eye movements in reading and information processing : 20 years of research
Keith Rayner. Eye movements in reading and information processing : 20 years of research. Psychological Bulletin, 124 0 (3): 0 372‑422, 1998. doi:10.1037/0033-2909.124.3.372. URL https://pubmed.ncbi.nlm.nih.gov/9849112/
-
[32]
Llm-sr: Scientific equation discovery via programming with large language models, 2024
Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models, 2024. URL https://arxiv.org/abs/2404.18400
-
[33]
Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen
Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen. Are language models actually useful for time series forecasting? In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 60162--60191. Curran Associates, Inc., 2024. URL h...
work page 2024
-
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https:...
work page 2017
-
[35]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks, 2018. URL https://arxiv.org/abs/1710.10903
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
MICN : Multi-scale local and global context modeling for long-term series forecasting
Huiqiang Wang, Jian Peng, Feihu Huang, Jince Wang, Junhui Chen, and Yifei Xiao. MICN : Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zt53IDUR1U
work page 2023
-
[37]
Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and Jun Zhou. Timemixer: Decomposable multiscale mixing for time series forecasting, 2024. URL https://arxiv.org/abs/2405.14616
-
[38]
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.\ 22419--22430. Curran Associates, Inc., 2021. URL https...
work page 2021
-
[39]
TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis
Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023. URL https://arxiv.org/abs/2210.02186
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
Fits: Modeling time series with 10k parameters, 2024
Zhijian Xu, Ailing Zeng, and Qiang Xu. Fits: Modeling time series with 10k parameters, 2024. URL https://arxiv.org/abs/2307.03756
-
[41]
Are transformers effective for time series forecasting?, 2022
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting?, 2022. URL https://arxiv.org/abs/2205.13504
-
[42]
Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=vSVLM2j9eie
work page 2023
-
[43]
Informer : Beyond efficient transformer for long sequence time-series forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer : Beyond efficient transformer for long sequence time-series forecasting. Proceedings Of The AAAI Conference On Artificial Intelligence, 35 0 (12): 0 11106‑11115, 2021. doi:10.1609/aaai.v35i12.17325. URL https://doi.org/10.1609/aaai.v35i12.17325
-
[44]
FED former: Frequency enhanced decomposed transformer for long-term series forecasting
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FED former: Frequency enhanced decomposed transformer for long-term series forecasting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings...
work page 2022
-
[45]
Fi LM : Frequency improved legendre memory model for long-term time series forecasting
Tian Zhou, Ziqing Ma, xue wang, Qingsong Wen, Liang Sun, Tao Yao, Wotao Yin, and Rong Jin. Fi LM : Frequency improved legendre memory model for long-term time series forecasting. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022 b . URL https://openreview.net/forum?id=zTQdHSQUQWc
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.