pith. machine review for the scientific record. sign in

arxiv: 2605.09498 · v1 · submitted 2026-05-10 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Spectral Transformer Neural Processes

Hao Chen, Xianhe Chen, Yingzhen Li

Pith reviewed 2026-05-12 03:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords neural processesspectral transformerperiodicitytime seriesspectral mixturefrequency awarequasi periodicitytransformer neural processes
0
0 comments X

The pith

Spectral Transformer Neural Processes capture periodicity by injecting spectral mixture features into transformer embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that Neural Processes can be made frequency-aware to handle periodic and quasi-periodic data more effectively than current translation-equivariant approaches. The proposed Spectral Aggregator estimates the context spectrum, compresses it, and adds sampled features to the time-domain inputs. If true, this would mean better generalization and less underfitting on datasets with repeating structures, which are prevalent in time series, spatial data, and images. The change works by altering how the model measures similarity between points that are phase-aligned but far apart in time.

Core claim

STNPs extend TNPs by adding a Spectral Aggregator that estimates an empirical context spectrum, compresses it into a spectral mixture, samples task-adaptive spectral features, and concatenates them with time-domain embeddings. This injects a spectral-mixture-kernel bias that reshapes the similarity geometry, allowing inputs distant in Euclidean space to be close in a periodic manifold and enhancing time-frequency interactions.

What carries the argument

Spectral Aggregator that estimates context spectrum, compresses to mixture, samples features and concatenates to time embeddings to reshape similarity for periodicity.

Load-bearing premise

That the concatenation of sampled spectral features reliably captures periodicity and quasi-periodicity without new overfitting or heavy tuning requirements.

What would settle it

Running STNP and TNP on the same periodic time series test set and finding no statistically significant improvement in log likelihood or error metrics for STNP.

Figures

Figures reproduced from arXiv: 2605.09498 by Hao Chen, Xianhe Chen, Yingzhen Li.

Figure 1
Figure 1. Figure 1: Overview of STNPs. (a) An empirical spectrum is estimated from the context set on a [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example predictions on synthetic sawtooth (top) and periodic (bottom) tasks. Columns [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustrative model outputs for image completion on the DTD dataset. Grey pixels in￾dicate query regions, while the remaining pixels serve as context observations. Following SConvCNP, we formulate image com￾pletion as spatial regression from 2D pixel coor￾dinates to intensity values. We use images from the Describable Textures Dataset (DTD) [9], and construct each task from a processed 64 × 64 sub￾sampled c… view at source ↗
Figure 4
Figure 4. Figure 4: Representative forecasting windows on the Electricity and Traffic datasets. The red curve [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative California traffic flow imputation episodes. The red curve denotes the [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative short-horizon forecasts on the Chimet climate dataset for tidal depth and [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Effect of the minimum context budget on predictive log-likelihood for the periodic [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Attention recovery on a representative periodic task with [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Chimet climate observations for June 2025 at the raw 5-minute resolution. The five panels [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Chimet climate observations for June 2025 after resampling to a 1-hour resolution. The [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗
read the original abstract

Time series, spatial data, and images are natural applications of Neural Processes. However, when such data exhibit strong periodicity and quasi-periodicity, existing methods often suffer from underfitting and generalise poorly beyond the training distribution. In this work, we propose Spectral Transformer Neural Processes (STNPs), a frequency-aware extension of Transformer Neural Processes (TNPs). STNPs introduce a Spectral Aggregator that estimates an empirical context spectrum, compresses it into a spectral mixture, samples task-adaptive spectral features, and concatenates them with time-domain embeddings, thereby injecting a spectral-mixture-kernel bias into TNPs. This design reshapes the similarity geometry, allowing inputs that are distant in Euclidean space to remain close in an induced periodic manifold while enhancing time-frequency interactions. Extensive experiments on synthetic regression tasks, real-world time-series datasets, and an image dataset demonstrate that STNPs consistently improve predictive performance over existing baselines, extending Neural Processes beyond translation equivariance towards effective modelling of periodicity and quasi-periodicity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Spectral Transformer Neural Processes (STNPs) as a frequency-aware extension of Transformer Neural Processes. A Spectral Aggregator estimates an empirical context spectrum, compresses it into a spectral mixture, samples task-adaptive spectral features, and concatenates them with time-domain embeddings to inject a spectral-mixture-kernel bias. This is intended to reshape similarity geometry so that Euclidean-distant inputs remain close on a periodic manifold, improving modeling of periodicity and quasi-periodicity. Experiments on synthetic regression, real-world time-series, and image tasks are reported to show consistent predictive gains over baselines.

Significance. If the gains are robust, the work would usefully extend Neural Processes beyond translation equivariance to periodic data, with applications in time-series and spatial modeling. The multi-domain evaluation (synthetic, time-series, images) and explicit focus on a known failure mode of existing NPs are strengths that could make the contribution substantive if the mechanism is shown to be reliable.

major comments (2)
  1. [§3] §3 (Spectral Aggregator): The compression of the empirical context spectrum into a spectral mixture and the subsequent sampling of task-adaptive features are described at a high level only. No equations specify the compression objective, choice of mixture components, or handling of phase/multi-frequency information. This is load-bearing for the central claim, as the skeptic concern correctly identifies that high-variance or biased spectrum estimates from small context sets could prevent the intended periodic manifold from forming.
  2. [§4] §4 (Experiments): Reported improvements lack error bars across runs, statistical significance tests, and ablations isolating the spectral-mixture concatenation from other architectural choices or hyperparameter effects. Without these, it is impossible to confirm that performance gains arise from the claimed reshaping of similarity geometry rather than post-hoc tuning or dataset selection.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'consistently improve predictive performance' would be strengthened by including at least one quantitative example (e.g., average RMSE reduction) rather than remaining purely qualitative.
  2. Notation: The interaction between the sampled spectral features and the Transformer attention layers should be formalized with an explicit equation showing how the concatenated embeddings modify the similarity kernel.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We appreciate the recognition of the potential contribution in extending Neural Processes to handle periodicity. We address each major comment below and commit to revising the manuscript to incorporate the suggested improvements, including additional mathematical details and enhanced experimental analysis.

read point-by-point responses
  1. Referee: [§3] §3 (Spectral Aggregator): The compression of the empirical context spectrum into a spectral mixture and the subsequent sampling of task-adaptive features are described at a high level only. No equations specify the compression objective, choice of mixture components, or handling of phase/multi-frequency information. This is load-bearing for the central claim, as the skeptic concern correctly identifies that high-variance or biased spectrum estimates from small context sets could prevent the intended periodic manifold from forming.

    Authors: We agree with the referee that the description of the Spectral Aggregator in §3 is at a high level. In the revised manuscript, we will expand this section with detailed equations specifying the compression objective for the empirical context spectrum into a spectral mixture, the choice and fitting of mixture components, and the handling of phase and multi-frequency information through the sampling of task-adaptive spectral features. We will also include additional analysis addressing the potential issues with high-variance or biased spectrum estimates from small context sets, such as empirical evaluations on varying context sizes to demonstrate the robustness of the periodic manifold formation. revision: yes

  2. Referee: [§4] §4 (Experiments): Reported improvements lack error bars across runs, statistical significance tests, and ablations isolating the spectral-mixture concatenation from other architectural choices or hyperparameter effects. Without these, it is impossible to confirm that performance gains arise from the claimed reshaping of similarity geometry rather than post-hoc tuning or dataset selection.

    Authors: We acknowledge the importance of rigorous statistical validation in the experiments. In the revised version, we will include error bars (standard deviations across multiple random seeds/runs), perform statistical significance tests (e.g., paired t-tests or Wilcoxon tests) to compare STNPs against baselines, and add ablation studies that isolate the effect of the spectral-mixture concatenation by comparing variants with and without it, while controlling for other hyperparameters. These additions will help confirm that the gains stem from the proposed mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architectural extension validated empirically

full rationale

The paper proposes STNPs as a frequency-aware extension of TNPs via a new Spectral Aggregator module that estimates an empirical context spectrum, compresses it, samples features, and concatenates them with time-domain embeddings. This is an explicit architectural design choice whose claimed benefits (improved modeling of periodicity) are demonstrated through experiments on synthetic, time-series, and image data rather than derived from prior equations. No load-bearing steps reduce by construction to fitted parameters, self-definitions, or self-citation chains; the central claim remains independent of its inputs and is tested against external baselines.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the spectral aggregator can be trained end-to-end to produce useful frequency features without explicit regularization on the spectrum estimate. No new physical entities are postulated.

free parameters (1)
  • spectral mixture components
    Number and parameters of the mixture used to compress the empirical spectrum; these are learned or chosen per task.
axioms (1)
  • domain assumption Transformer attention can effectively integrate concatenated time-domain and spectral features
    Invoked when the paper states that concatenation injects spectral-mixture-kernel bias into TNPs.
invented entities (1)
  • Spectral Aggregator no independent evidence
    purpose: Estimates empirical context spectrum, compresses to mixture, samples task-adaptive features
    New module introduced in the architecture; no independent evidence outside the paper's experiments.

pith-pipeline@v0.9.0 · 5459 in / 1342 out tokens · 28083 ms · 2026-05-12T03:59:14.765861+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

  1. [1]

    Aky¨urek, E., Schuurmans, D., Andreas, J., Ma, T., and Zhou, D

    Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. Transformers learn to implement preconditioned gradient descent for in-context learning.ArXiv, abs/2306.00297,

  2. [2]

    URLhttps://api.semanticscholar.org/CorpusID:258999480

  3. [3]

    What learning algorithm is in-context learning? investigations with linear models, 2023

    Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear models.ArXiv, abs/2211.15661,

  4. [4]

    URLhttps://api.semanticscholar.org/CorpusID:254043800

  5. [5]

    Bruinsma, and Richard E

    Matthew Ashman, Cristiana-Diana Diaconu, Junhyuck Kim, Lakee Sivaraya, Stratis Markou, James Requeima, Wessel P. Bruinsma, and Richard E. Turner. Translation equivari- ant transformer neural processes.ArXiv, abs/2406.12409, 2024. URL https://api. semanticscholar.org/CorpusID:270562561

  6. [6]

    Matthew Ashman, Cristiana-Diana Diaconu, Eric Langezaal, Adrian Weller, and Richard E. Turner. Gridded transformer neural processes for spatio-temporal data. InInternational Confer- ence on Machine Learning, 2025. URL https://api.semanticscholar.org/CorpusID: 283567166

  7. [7]

    Transformers as statisticians: Provable in-context learning with in-context algorithm selection.ArXiv, abs/2306.04637, 2023

    Yu Bai, Fan Chen, Haiquan Wang, Caiming Xiong, and Song Mei. Transformers as statisticians: Provable in-context learning with in-context algorithm selection.ArXiv, abs/2306.04637, 2023. URLhttps://api.semanticscholar.org/CorpusID:259095794

  8. [8]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agar- wal, Ariel Herbert-V oss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Ma teusz L...

  9. [9]

    Bruinsma, James Requeima, Andrew Y

    Wessel P. Bruinsma, James Requeima, Andrew Y . K. Foong, Jonathan Gordon, and Richard E. Turner. The gaussian neural process.ArXiv, abs/2101.03606, 2021. URL https://api. semanticscholar.org/CorpusID:228102666

  10. [10]

    Choromanski, K., Likhosherstov, V ., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., Belanger, D., Colwell, L., and Weller, A

    Xiang Cheng, Yuxin Chen, and Suvrit Sra. Transformers implement functional gradient descent to learn non-linear functions in context.ArXiv, abs/2312.06528, 2023. URL https: //api.semanticscholar.org/CorpusID:266162320

  11. [11]

    Describing textures in the wild.2014 IEEE Conference on Computer Vision and Pattern Recog- nition, pages 3606–3613, 2013

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild.2014 IEEE Conference on Computer Vision and Pattern Recog- nition, pages 3606–3613, 2013. URL https://api.semanticscholar.org/CorpusID: 4309276

  12. [12]

    J. L. B. Cooper and Salomon Bochner. Harmonic analysis and the theory of probability.The Mathematical Gazette, 41:154 – 155, 1957. URL https://api.semanticscholar.org/ CorpusID:121662867

  13. [13]

    Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers

    Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers

  14. [14]

    URLhttps://api.semanticscholar.org/CorpusID:258686544

  15. [15]

    Fan: Fourier analysis networks.ArXiv, abs/2410.02675, 2024

    Yihong Dong, Ge Li, Yongding Tao, Xue Jiang, Kechi Zhang, Jia Li, Jing Su, Jun Zhang, and Jingjing Xu. Fan: Fourier analysis networks.ArXiv, abs/2410.02675, 2024. URL https://api.semanticscholar.org/CorpusID:273098297

  16. [16]

    What can transformers learn in-context? a case study of simple function classes.ArXiv, abs/2208.01066, 2022

    Shivam Garg, Dimitris Tsipras, Percy Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes.ArXiv, abs/2208.01066, 2022. URL https://api.semanticscholar.org/CorpusID:251253368. 10

  17. [17]

    Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Jimenez Rezende, and S

    Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Jimenez Rezende, and S. M. Ali Eslami. Conditional neural processes.ArXiv, abs/1807.01613, 2018. URL https://api.semanticscholar. org/CorpusID:49574993

  18. [18]

    Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo Jimenez Rezende, S. M. Ali Eslami, and Yee Whye Teh. Neural processes.ArXiv, abs/1807.01622, 2018. URL https://api.semanticscholar.org/CorpusID:49568863

  19. [19]

    Bruinsma, Andrew Y

    Jonathan Gordon, Wessel P. Bruinsma, Andrew Y . K. Foong, James Requeima, Yann Dubois, and Richard E. Turner. Convolutional conditional neural processes.ArXiv, abs/1910.13556,

  20. [20]

    URLhttps://api.semanticscholar.org/CorpusID:204960684

  21. [21]

    Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes

    Daniel Jenson, Jhonathan Navott, Piotr Grynfelder, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, and Seth Flaxman. Scalable spatiotemporal inference with biased scan attention transformer neural processes.ArXiv, abs/2506.09163, 2025. URL https://api. semanticscholar.org/CorpusID:279306065

  22. [22]

    Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, S. M. Ali Eslami, Dan Rosen- baum, Oriol Vinyals, and Yee Whye Teh. Attentive neural processes.ArXiv, abs/1901.05761,

  23. [23]

    URLhttps://api.semanticscholar.org/CorpusID:58014184

  24. [24]

    Reformer: The Efficient Transformer

    Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer. ArXiv, abs/2001.04451, 2020. URL https://api.semanticscholar.org/CorpusID: 209315300

  25. [25]

    Modeling long- and short-term temporal patterns with deep neural networks.The 41st International ACM SI- GIR Conference on Research & Development in Information Retrieval, 2017

    Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short-term temporal patterns with deep neural networks.The 41st International ACM SI- GIR Conference on Research & Development in Information Retrieval, 2017. URL https: //api.semanticscholar.org/CorpusID:4922476

  26. [26]

    Bootstrapping neural processes.ArXiv, abs/2008.02956, 2020

    Juho Lee, Yoonho Lee, Jungtaek Kim, Eunho Yang, Sung Ju Hwang, and Yee Whye Teh. Bootstrapping neural processes.ArXiv, abs/2008.02956, 2020. URL https://api. semanticscholar.org/CorpusID:221083236

  27. [27]

    Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting.ArXiv, abs/1907.00235, 2019

    SHIYANG LI, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting.ArXiv, abs/1907.00235, 2019. URL https://api.semanticscholar. org/CorpusID:195766887

  28. [28]

    Largest: A benchmark dataset for large-scale traffic forecasting

    Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chaoqin Huang, Zhen- guang Liu, Bryan Hooi, and Roger Zimmermann. Largest: A benchmark dataset for large-scale traffic forecasting.ArXiv, abs/2306.08259, 2023. URL https://api.semanticscholar. org/CorpusID:259165246

  29. [29]

    Spectral convolutional conditional neural processes

    Peiman Mohseni and Nick Duffield. Spectral convolutional conditional neural processes. ArXiv, abs/2404.13182, 2024. URL https://api.semanticscholar.org/CorpusID: 269292913

  30. [30]

    Neural processes for short-term forecasting of weather attributes

    Benedetta L Mussati, Helen McKay, and Stephen Roberts. Neural processes for short-term forecasting of weather attributes

  31. [31]

    On estimating regression.Theory of Probability & Its Applications, 9(1): 141–142, 1964

    Elizbar A Nadaraya. On estimating regression.Theory of Probability & Its Applications, 9(1): 141–142, 1964

  32. [32]

    Transformer neural processes: Uncertainty-aware meta learning via sequence modeling.arXiv preprint arXiv:2207.04179, 2022

    Tung Nguyen and Aditya Grover. Transformer neural processes: Uncertainty-aware meta learning via sequence modeling.ArXiv, abs/2207.04179, 2022. URL https://api. semanticscholar.org/CorpusID:250340974

  33. [33]

    Random features for large-scale kernel machines

    Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. InAdvances in Neural Information Processing Systems, 2007. 11

  34. [34]

    Sequential neural processes

    Gautam Singh, Jaesik Yoon, Youngsung Son, and Sungjin Ahn. Sequential neural processes. In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/ CorpusID:195584118

  35. [35]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeural Information Processing Systems, 2017. URLhttps://api.semanticscholar.org/CorpusID:13756489

  36. [36]

    Distance-informed neural processes

    Aishwarya Venkataramanan and Joachim Denzler. Distance-informed neural processes. ArXiv, abs/2508.18903, 2025. URL https://api.semanticscholar.org/CorpusID: 280870623

  37. [37]

    Randazzo, João Sacramento, Alexander Mord- vintsev, Andrey Zhmoginov, and Max Vladymyrov

    Johannes von Oswald, Eyvind Niklasson, E. Randazzo, João Sacramento, Alexander Mord- vintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gra- dient descent. InInternational Conference on Machine Learning, 2022. URL https: //api.semanticscholar.org/CorpusID:254685643

  38. [38]

    Xuesong Wang, He Zhao, and Edwin V . Bonilla. Rényi neural processes.ArXiv, abs/2405.15991,

  39. [39]

    URLhttps://api.semanticscholar.org/CorpusID:270063827

  40. [40]

    Andrew Gordon Wilson and Ryan P. Adams. Gaussian process kernels for pattern discovery and extrapolation. InInternational Conference on Machine Learning, 2013. URL https: //api.semanticscholar.org/CorpusID:279814

  41. [41]

    Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting. InNeural Information Processing Systems, 2021. URLhttps://api.semanticscholar.org/CorpusID:235623791

  42. [42]

    Deep sets

    Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alex Smola. Deep sets. 2017. URL https://api.semanticscholar.org/CorpusID: 4870287

  43. [43]

    Multi- resolution time-series transformer for long-term forecasting

    Yitian Zhang, Liheng Ma, Soumyasundar Pal, Yingxue Zhang, and Mark Coates. Multi- resolution time-series transformer for long-term forecasting. InInternational Conference on Artificial Intelligence and Statistics, 2023. URL https://api.semanticscholar.org/ CorpusID:265043382

  44. [44]

    CoRR abs/2012.07436 (2020)

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wan Zhang. Informer: Beyond efficient transformer for long sequence time-series forecast- ing.ArXiv, abs/2012.07436, 2020. URL https://api.semanticscholar.org/CorpusID: 229156802

  45. [45]

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInterna- tional Conference on Machine Learning, 2022. URL https://api.semanticscholar.org/ CorpusID:246430171. 12 Table 7: Key symbols used in the main text and appendix. Symbol Meaning X,YInput an...

  46. [46]

    Synthetic regression and California traffic flow.These are scalar-input, scalar-output settings withd x =d y = 1andP={1}

  47. [47]

    DTD image completion.The spectral branch uses the full two-dimensional pixel coordinate, P={1,2}, and the shared selected-channel spectrum with all RGB channels,A={1,2,3}

  48. [48]

    For multivariate datasets, the spectral summary uses the shared all-channel aggregation, A={1,

    Six-dataset forecasting benchmark.The spectral branch is applied to the temporal coordinate, so P={1} . For multivariate datasets, the spectral summary uses the shared all-channel aggregation, A={1, . . . , d y}

  49. [49]

    The prediction head outputs all five variables {WSPD,ATMP,DEPTH,AVWHT,BARO} , but the spectral branch uses channel-wise selected spectra withA={ATMP,DEPTH}

    Chimet.The input side also uses the temporal coordinate, so P={1} . The prediction head outputs all five variables {WSPD,ATMP,DEPTH,AVWHT,BARO} , but the spectral branch uses channel-wise selected spectra withA={ATMP,DEPTH} . Specifically, one scalar spectral branch is built from ATMP and one from DEPTH, using the period ranges reported in Table 18; their...

  50. [50]

    (a) Irregular spectral energy estimation.Let K be the number of grid frequencies and let P ⊆ {1,

    Context-side spectral parameter estimation.This stage operates only on the context set and consists of three operations. (a) Irregular spectral energy estimation.Let K be the number of grid frequencies and let P ⊆ {1, . . . , dx} be the input-coordinate subset used by the spectral branch, with dp =|P| . For each of the M context points and each of theK ve...

  51. [51]

    This stage consists of two parts

    Token-wise embedding construction.Once the spectral parameters are estimated from the context set, they are used to construct token embeddings for all N tokens. This stage consists of two parts. (a) Spectral features.For each of the Q mixture components, the model samples D0 vector frequencies in Rdp. Sampling with a full covariance requires O(Qd3 p) oper...

  52. [52]

    Total embedding complexity.By combining the context-side estimation and the token-wise embedding construction, we obtain T SMK embed(N, M) =O M K(dp +α) +K κCinC+ (L c −2)κC 2 +CQ +QKd 2 p +Qd 3 p +QD 0d2 p +N QD 0dp +N . For channel-wise selected spectra, such as the Chimet configuration, the same expression is applied independently to each selected chan...

  53. [53]

    Overall complexity.Substituting this into the backbone-plus-embedding decomposition yields Tours(N, M)∈ O N2 +M K(d p +α) +K κCinC+ (L c −2)κC 2 +CQ +QKd 2 p +Qd 3 p +QD 0d2 p +N QD 0dp +N . 19 When K, Q, D0, dp, α, κ, C, and Lc (or the corresponding channel-wise sets {Kc, Qc, D0,c}c∈A) are treated as fixed hyperparameters, all terms except those dependin...

  54. [54]

    (2) Electricity4 contains the hourly electricity consumption of 321 clients from 2012 to

  55. [55]

    (4) Traffic5 is a collection of hourly road occupancy measurements from the California Department of Transportation, capturing sensor readings on San Francisco Bay Area freeways

    (3) Exchange Rate [20] records the daily exchange rates of eight countries ranging from 1990 to 2016. (4) Traffic5 is a collection of hourly road occupancy measurements from the California Department of Transportation, capturing sensor readings on San Francisco Bay Area freeways. (5) Weather6 is recorded every 10 minutes over the year 2020 and contains 21...