arxiv: 2605.07476 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: no theorem link

NPMixer: Hierarchical Neighboring Patch Mixing for Time Series Forecasting

Jung Min Choi, Lars Schmidt-Thieme, Vijaya Krishna Yalavarthi

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:55 UTC · model grok-4.3

classification 💻 cs.LG

keywords multivariate time series forecastingwavelet transformpatch mixingMLP architecturehierarchical modelchannel mixingneighboring patches

0 comments

The pith

NPMixer decomposes time series with a learnable wavelet and mixes neighboring patches hierarchically to improve multivariate forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multivariate time series forecasting can be advanced by first applying a data-dependent wavelet decomposition to separate stable trends from varying details, then using hierarchical MLP layers on non-overlapping neighboring patches to model local temporal patterns at multiple scales. A separate channel-mixing step on the high-frequency parts learns correlations across variables without destabilizing the trend. This combination is tested on seven standard benchmarks where it beats prior models in most setups. A sympathetic reader would care because existing methods often struggle to balance fine-grained local dynamics with broader dependencies across channels, and a simpler MLP-based hierarchy might scale more efficiently than attention-heavy alternatives.

Core claim

NPMixer is a hierarchical architecture featuring a Learnable Stationary Wavelet Transform that adaptively learns filter coefficients to decompose multivariate signals into trend and detail components in a data-dependent manner. The Neighboring Mixer Block then applies series of MLP layers to non-overlapping patches to capture local temporal dynamics and expand the receptive field across scales by learning patterns within and across patches. A Channel-Mixing Encoder processes the high-frequency components to capture channel correlations while preserving the stability of the underlying global trend.

What carries the argument

The Neighboring Mixer Block, a stack of hierarchical MLPs operating on non-overlapping patches after a learnable wavelet decomposition, which expands the receptive field to model multi-scale temporal dependencies and channel correlations.

If this is right

Forecasting accuracy improves when receptive fields are expanded through successive neighboring patch mixing rather than fixed-scale convolutions or global attention.
Separating trend and detail components allows channel correlations to be modeled selectively on high-frequency parts without disturbing long-term stability.
MLP-based hierarchical mixing on patches provides competitive or superior results to transformer-based models on standard multivariate forecasting tasks.
The architecture maintains performance gains across varying prediction horizons and dataset characteristics in 71 percent of tested configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition-plus-neighbor-mixing pattern could be tested on univariate series or on non-temporal sequence tasks such as text or audio where local structure matters.
Because the model relies on fixed non-overlapping patches, its behavior on irregularly sampled or streaming time series would reveal whether online adaptation of the wavelet filters remains stable.
Combining the channel-mixing encoder with explicit trend modeling from classical time-series methods might further reduce error on datasets dominated by seasonality.

Load-bearing premise

The learnable wavelet decomposition and hierarchical neighboring patch mixing capture the relevant local temporal dynamics and channel correlations more effectively than prior architectures without introducing new overfitting risks or requiring dataset-specific tuning.

What would settle it

Running the full set of 28 experimental setups on the seven benchmarks after ablating either the learnable wavelet filters or the neighboring patch mixing structure and observing whether performance drops below the reported state-of-the-art levels.

Figures

Figures reproduced from arXiv: 2605.07476 by Jung Min Choi, Lars Schmidt-Thieme, Vijaya Krishna Yalavarthi.

**Figure 1.** Figure 1: The overall architecture of NPMixer. The input sequence is decomposed via Learnable SWT. Detail coefficients (XD) are processed via Channel-Mixing Encoders and Neighboring Mixer Blocks, while the approximate coefficient (XA) skips the encoder to preserve trend information. Features are reconstructed via ISWT for the final forecast. where C denotes the number of variates. For notational convenience, we here… view at source ↗

**Figure 2.** Figure 2: The Architecture of the Neighboring Mixer Block. Building on the findings of Gontijo-Lopes et al. (GontijoLopes et al., 2021), which suggest that representation power is maximized when components specialize in distinct data subdomains, we implement an asymmetric update strategy. Rather than applying symmetric mixing, we enforce a directional bias that aligns with the temporal structure of forecasting. Sp… view at source ↗

**Figure 3.** Figure 3: Hierarchical patch mixing process. An 8-Patch Case To illustrate the hierarchical mixing mechanism, we consider a sequence of N = 8 patches as shown in [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

Multivariate time series forecasting remains a challenge due to the complexity of local temporal dynamics and global dependencies across multiple variables. In this paper, we propose \textbf{N}eighboring \textbf{P}atching \textbf{Mixer} (\textbf{NPMixer}), a hierarchical architecture featuring a Learnable Stationary Wavelet Transform that adaptively learns filter coefficients to decompose signals into trend and detail components in a data-dependent manner. Our framework introduces a Neighboring Mixer Block that captures local temporal dynamics through a series of hierarchical MLP layers operating on non-overlapping patches. Specifically, the mixer block utilizes MLPs to learn temporal patterns within and across these patches, expanding the receptive field to capture multi-scale dependencies. A Channel-Mixing Encoder is applied to high-frequency components to learn channel correlations while preserving the stability of the underlying global trend. Extensive experiments on seven benchmark datasets demonstrate that NPMixer consistently outperforms state-of-the-art models, achieving better performance in 20 out of 28 ($71.4\%$) evaluated experimental setups for MSE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes NPMixer, a hierarchical architecture for multivariate time series forecasting. It uses a Learnable Stationary Wavelet Transform to adaptively decompose signals into trend and detail components via data-dependent filter coefficients, a Neighboring Mixer Block with hierarchical MLPs on non-overlapping patches to capture local temporal dynamics and multi-scale dependencies, and a Channel-Mixing Encoder for high-frequency channel correlations. The central empirical claim is consistent outperformance over state-of-the-art models on seven benchmark datasets, with better MSE in 20 out of 28 setups (71.4%).

Significance. If the performance gains prove robust and attributable to the proposed components, NPMixer would offer a useful advance in time series forecasting by showing how learnable wavelet decomposition combined with hierarchical neighboring patch mixing can better handle local dynamics and channel correlations than prior architectures.

major comments (2)

[Experiments] Experiments section: The results consist solely of single-run point estimates of MSE across the 28 setups (7 datasets × 4 horizons) with no standard deviations, confidence intervals, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent superiority in 20/28 cases, as the reported wins cannot be distinguished from run-to-run variability or differences in hyperparameter search effort.
[§3 (Methodology)] §3 (Methodology): The description of the Neighboring Mixer Block does not include an ablation isolating the contribution of the hierarchical neighboring patch mixing versus the learnable wavelet or the channel-mixing encoder, leaving open whether the reported gains stem from the full architecture or from one component.

minor comments (1)

[Abstract] The abstract could specify the exact forecasting horizons and datasets used in the 28 setups for greater precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive view of NPMixer's potential contribution. We address each major comment below and commit to revisions that strengthen the empirical support and component analysis.

read point-by-point responses

Referee: [Experiments] Experiments section: The results consist solely of single-run point estimates of MSE across the 28 setups (7 datasets × 4 horizons) with no standard deviations, confidence intervals, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent superiority in 20/28 cases, as the reported wins cannot be distinguished from run-to-run variability or differences in hyperparameter search effort.

Authors: We agree that single-run point estimates limit the ability to quantify variability and statistical significance, even though consistent gains across seven diverse datasets and four horizons provide supporting evidence. In the revised manuscript we will rerun all experiments using multiple random seeds (at least five), report mean MSE with standard deviations, and include paired statistical significance tests against the strongest baselines to substantiate the 20/28 superiority claim. revision: yes
Referee: [§3 (Methodology)] §3 (Methodology): The description of the Neighboring Mixer Block does not include an ablation isolating the contribution of the hierarchical neighboring patch mixing versus the learnable wavelet or the channel-mixing encoder, leaving open whether the reported gains stem from the full architecture or from one component.

Authors: The referee correctly observes that the current manuscript lacks an ablation that isolates the hierarchical neighboring patch mixing from the learnable wavelet decomposition and channel-mixing encoder. While each component is motivated in the text, an explicit ablation is needed to attribute performance. We will add a dedicated ablation study in the revised version, evaluating variants that remove or replace each module while keeping the others fixed, and report the resulting MSE changes on the benchmark datasets. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with no derivation chain

full rationale

The paper presents NPMixer as a neural architecture for multivariate time series forecasting, built from a learnable stationary wavelet transform, neighboring mixer blocks with MLPs on patches, and a channel-mixing encoder. No first-principles derivation, theorem, or prediction is claimed; performance is asserted solely via empirical MSE comparisons on seven benchmarks. No equations reduce to fitted inputs by construction, no self-citations are invoked as load-bearing uniqueness results, and no ansatz or renaming of known results occurs in a circular manner. The model definitions are independent of the reported outcomes.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of the proposed components rather than on unstated mathematical axioms or new physical entities. The learnable filter coefficients are data-driven parameters, not fixed axioms.

free parameters (1)

learnable filter coefficients of the stationary wavelet transform
These coefficients are adapted from data during training to decompose signals into trend and detail components.

axioms (1)

domain assumption Standard deep-learning assumptions that gradient-based optimization on MSE will discover useful temporal and cross-channel patterns when the architecture is sufficiently expressive.
Invoked implicitly by the use of MLPs and the claim that the mixer blocks capture multi-scale dependencies.

pith-pipeline@v0.9.0 · 5485 in / 1313 out tokens · 28017 ms · 2026-05-11T01:55:56.148564+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[3]

M. J. Kearns , title =

work page
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[6]

Suppressed for Anonymity , author=

work page
[7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[9]

ICLR , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. ICLR , year=

work page
[10]

ICLR , year=

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. ICLR , year=

work page
[11]

Jena Climate Dataset , year =

work page
[12]

2015 , howpublished =

Trindade, Artur , title =. 2015 , howpublished =

work page 2015
[13]

2024 , howpublished =

Performance Measurement System (. 2024 , howpublished =

work page 2024
[14]

2015 , publisher=

Time series analysis: forecasting and control , author=. 2015 , publisher=

work page 2015
[15]

AAAI , year=

Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. AAAI , year=

work page
[16]

International conference on machine learning , year=

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting , author=. International conference on machine learning , year=

work page
[17]

ICLR , year=

Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting , author=. ICLR , year=

work page
[18]

AAAI , year=

Are transformers effective for time series forecasting? , author=. AAAI , year=

work page
[19]

Si-An Chen and Chun-Liang Li and Sercan O Arik and Nathanael Christian Yoder and Tomas Pfister , journal=

work page
[20]

Proceedings of the IEEE , volume=

What is the fast Fourier transform? , author=. Proceedings of the IEEE , volume=. 1967 , publisher=

work page 1967
[21]

NeurIPS , year=

Frequency-domain MLPs are more effective learners in time series forecasting , author=. NeurIPS , year=

work page
[22]

AAAI , author=

WaveForM: Graph Enhanced Wavelet Learning for Long Sequence Forecasting of Multivariate Time Series , abstractNote=. AAAI , author=. 2023 , month=

work page 2023
[23]

AAAI , year=

Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting , author=. AAAI , year=

work page
[24]

Hui Chen and Viet Luong and Lopamudra Mukherjee and Vikas Singh , booktitle=. Simple

work page
[25]

Nason, G. P. and Silverman, B. W. The Stationary Wavelet Transform and some Statistical Applications. Wavelets and Statistics. 1995

work page 1995
[26]

SIAM review , volume=

Continuous and discrete wavelet transforms , author=. SIAM review , volume=. 1989 , publisher=

work page 1989
[27]

ICLR , year=

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting , author=. ICLR , year=

work page
[28]

ICLR , year=

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting , author=. ICLR , year=

work page
[29]

The eleventh international conference on learning representations , year=

Micn: Multi-scale local and global context modeling for long-term series forecasting , author=. The eleventh international conference on learning representations , year=

work page
[30]

ACM SIGKDD , year=

Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting , author=. ACM SIGKDD , year=

work page
[31]

IEEE/CVF , year=

Swin transformer: Hierarchical vision transformer using shifted windows , author=. IEEE/CVF , year=

work page
[32]

Advances in Neural Information Processing Systems , volume=

Crossgnn: Confronting noisy multivariate time series via cross interaction refinement , author=. Advances in Neural Information Processing Systems , volume=

work page
[33]

Advances in Neural Information Processing Systems , volume=

Timexer: Empowering transformers for time series forecasting with exogenous variables , author=. Advances in Neural Information Processing Systems , volume=

work page
[34]

International Conference on Learning Representations , year=

Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , author=. International Conference on Learning Representations , year=

work page
[35]

NeurIPS , year=

Attention is all you need , author=. NeurIPS , year=

work page
[36]

NeurIPS , year=

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. NeurIPS , year=

work page
[37]

2019 , eprint=

Optuna: A Next-generation Hyperparameter Optimization Framework , author=. 2019 , eprint=

work page 2019
[40]

Proceedings of the National Academy of Sciences , volume =

Gabriel Michau and Gaetan Frusque and Olga Fink , title =. Proceedings of the National Academy of Sciences , volume =. 2022 , doi =

work page 2022
[41]

2023 , eprint=

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis , author=. 2023 , eprint=

work page 2023
[42]

Optuna: A next-generation hyperparameter optimization framework, 2019

Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. Optuna: A next-generation hyperparameter optimization framework, 2019

work page 2019
[43]

E., Jenkins, G

Box, G. E., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. Time series analysis: forecasting and control. John Wiley & Sons, 2015

work page 2015
[44]

Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting

Cai, W., Liang, Y., Liu, X., Feng, J., and Wu, Y. Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting. In AAAI, 2024

work page 2024
[45]

Performance measurement system ( PeMS )

California Department of Transportation (Caltrans) . Performance measurement system ( PeMS ). https://pems.dot.ca.gov/, 2024. Accessed: 2024-05-20

work page 2024
[46]

Simple TM : A simple baseline for multivariate time series forecasting

Chen, H., Luong, V., Mukherjee, L., and Singh, V. Simple TM : A simple baseline for multivariate time series forecasting. In ICLR, 2025

work page 2025
[47]

O., Yoder, N

Chen, S.-A., Li, C.-L., Arik, S. O., Yoder, N. C., and Pfister, T. TSM ixer: An all- MLP architecture for time series forecasting. TMLR, 2023

work page 2023
[48]

An image is worth 16x16 words: Transformers for image recognition at scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021

work page 2021
[49]

Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting

Ekambaram, V., Jati, A., Nguyen, N., Sinthong, P., and Kalagnanam, J. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. In ACM SIGKDD, 2023

work page 2023
[50]

Gontijo-Lopes, R., Dauphin, Y., and Cubuk, E. D. No one representation to rule them all: Overlapping features of training methods. arXiv preprint arXiv:2110.12899, 2021

work page arXiv 2021
[51]

Heil, C. E. and Walnut, D. F. Continuous and discrete wavelet transforms. SIAM review, 31 0 (4): 0 628--666, 1989

work page 1989
[52]

Crossgnn: Confronting noisy multivariate time series via cross interaction refinement

Huang, Q., Shen, L., Zhang, R., Ding, S., Wang, B., Zhou, Z., and Wang, Y. Crossgnn: Confronting noisy multivariate time series via cross interaction refinement. Advances in Neural Information Processing Systems, 36: 0 46885--46902, 2023

work page 2023
[53]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Kim, T., Kim, J., Tae, Y., Park, C., Choi, J.-H., and Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

work page 2022
[54]

itransformer: Inverted transformers are effective for time series forecasting

Liu, Y., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., and Long, M. itransformer: Inverted transformers are effective for time series forecasting. In ICLR, 2024

work page 2024
[55]

Swin transformer: Hierarchical vision transformer using shifted windows

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF, 2021

work page 2021
[56]

Jena climate dataset

Max Planck Institute for Biogeochemistry . Jena climate dataset. https://www.bgc-jena.mpg.de/wetter/, 2024. Accessed: 2024-05-20

work page 2024
[57]

Fully learnable deep wavelet transform for unsupervised monitoring of high-frequency time series

Michau, G., Frusque, G., and Fink, O. Fully learnable deep wavelet transform for unsupervised monitoring of high-frequency time series. Proceedings of the National Academy of Sciences, 119 0 (8): 0 e2106598119, 2022. doi:10.1073/pnas.2106598119

work page doi:10.1073/pnas.2106598119 2022
[58]

Nason, G. P. and Silverman, B. W. The Stationary Wavelet Transform and some Statistical Applications, pp.\ 281--299. Springer New York, New York, NY, 1995. ISBN 978-1-4612-2544-7

work page 1995
[59]

H., Sinthong, P., and Kalagnanam, J

Nie, Y., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In ICLR, 2023

work page 2023
[60]

Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods.arXiv preprint arXiv:2403.20150, 2024

Qiu, X., Hu, J., Zhou, L., Wu, X., Du, J., Zhang, B., Guo, C., Zhou, A., Jensen, C. S., Sheng, Z., et al. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. arXiv preprint arXiv:2403.20150, 2024

work page arXiv 2024
[61]

ElectricityLoadDiagrams20112014

Trindade, A. ElectricityLoadDiagrams20112014 . UCI Machine Learning Repository, 2015. DOI : https://doi.org/10.24432/C58C86

work page doi:10.24432/c58c86 2015
[62]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. NeurIPS, 2017

work page 2017
[63]

Micn: Multi-scale local and global context modeling for long-term series forecasting

Wang, H., Peng, J., Huang, F., Wang, J., Chen, J., and Xiao, Y. Micn: Multi-scale local and global context modeling for long-term series forecasting. In The eleventh international conference on learning representations, 2023

work page 2023
[64]

Y., and ZHOU, J

Wang, S., Wu, H., Shi, X., Hu, T., Luo, H., Ma, L., Zhang, J. Y., and ZHOU, J. Timemixer: Decomposable multiscale mixing for time series forecasting. In ICLR, 2024 a

work page 2024
[65]

Timexer: Empowering transformers for time series forecasting with exogenous variables

Wang, Y., Wu, H., Dong, J., Qin, G., Zhang, H., Liu, Y., Qiu, Y., Wang, J., and Long, M. Timexer: Empowering transformers for time series forecasting with exogenous variables. Advances in Neural Information Processing Systems, 37: 0 469--498, 2024 b

work page 2024
[66]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

Wu, H., Xu, J., Wang, J., and Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. NeurIPS, 2021

work page 2021
[67]

Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023

Wu, H., Hu, T., Liu, Y., Zhou, H., Wang, J., and Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023

work page 2023
[68]

Waveform: Graph enhanced wavelet learning for long sequence forecasting of multivariate time series

Yang, F., Li, X., Wang, M., Zang, H., Pang, W., and Wang, M. Waveform: Graph enhanced wavelet learning for long sequence forecasting of multivariate time series. AAAI, Jun. 2023

work page 2023
[69]

Frequency-domain mlps are more effective learners in time series forecasting

Yi, K., Zhang, Q., Fan, W., Wang, S., Wang, P., He, H., An, N., Lian, D., Cao, L., and Niu, Z. Frequency-domain mlps are more effective learners in time series forecasting. NeurIPS, 2023

work page 2023
[70]

Are transformers effective for time series forecasting? In AAAI, 2023

Zeng, A., Chen, M., Zhang, L., and Xu, Q. Are transformers effective for time series forecasting? In AAAI, 2023

work page 2023
[71]

and Yan, J

Zhang, Y. and Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In ICLR, 2023

work page 2023
[72]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI, 2021

work page 2021
[73]

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., and Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning. PMLR, 2022

work page 2022