pith. sign in

arxiv: 2502.12370 · v3 · submitted 2025-02-17 · 💻 cs.LG

Positional Encoding in Transformer-Based Time Series Models: A Survey

Pith reviewed 2026-05-23 02:22 UTC · model grok-4.3

classification 💻 cs.LG
keywords positional encodingtransformertime seriesclassificationsurveyforecastinganomaly detection
0
0 comments X

The pith

Data characteristics like sequence length and complexity determine which positional encoding performs best in transformer time series models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Transformer models for time series data require positional encodings to respect sequence order. This survey reviews fixed, learnable, relative, and hybrid encoding approaches and benchmarks them on classification tasks. It concludes that sequence length, signal complexity, and dimensionality strongly shape which method works well. Advanced encodings deliver accuracy gains but raise computational demands. The work also flags open challenges and research directions for improved encodings.

Core claim

The survey establishes that in transformer-based time series models, the effectiveness of different positional encoding approaches—fixed, learnable, relative, and hybrid—varies significantly based on data characteristics including sequence length, signal complexity, and dimensionality, with advanced methods providing accuracy improvements but incurring greater computational demands.

What carries the argument

Categorization of positional encoding methods (fixed, learnable, relative, hybrid) and their quantitative evaluation on time series classification tasks.

If this is right

  • Encoding selection must account for the specific sequence length and signal traits of the target data.
  • Advanced encodings improve prediction accuracy on classification tasks.
  • Gains in accuracy from advanced methods come with higher computational cost.
  • Challenges remain in balancing performance and efficiency for varied time series data.
  • Future work should target encodings that adapt to data characteristics without added complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trade-offs likely extend to forecasting and anomaly detection tasks even though the benchmarks focused on classification.
  • Hybrid encodings may offer a practical middle ground for very long sequences where pure advanced methods become too expensive.
  • Practitioners could build simple decision rules based on data statistics to pick encodings before training.

Load-bearing premise

The chosen time series classification tasks and benchmarks are representative enough to support general claims about method effectiveness.

What would settle it

A new set of benchmarks on diverse time series datasets that shows no measurable dependence of encoding performance on sequence length, complexity, or dimensionality.

read the original abstract

Recent advancements in transformer-based models have greatly improved time series analysis, providing robust solutions for tasks such as forecasting, anomaly detection, and classification. A crucial element of these models is positional encoding, which allows transformers to capture the intrinsic sequential nature of time series data. This survey systematically examines existing techniques for positional encoding in transformer-based time series models. We investigate a variety of methods, including fixed, learnable, relative, and hybrid approaches, and evaluate their effectiveness in different time series classification tasks. Our findings indicate that data characteristics like sequence length, signal complexity, and dimensionality significantly influence method effectiveness. Advanced positional encoding methods exhibit performance gains in terms of prediction accuracy, however, they come at the cost of increased computational complexity. Furthermore, we outline key challenges and suggest potential research directions to enhance positional encoding strategies. By delivering a comprehensive overview and quantitative benchmarking, this survey intends to assist researchers and practitioners in selecting and designing effective positional encoding methods for transformer-based time series models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys positional encoding methods (fixed, learnable, relative, and hybrid) for transformer-based time series models. It reviews existing techniques, performs quantitative benchmarking on selected time series classification tasks, and reports that data characteristics such as sequence length, signal complexity, and dimensionality significantly affect method performance, with advanced encodings improving accuracy at the expense of computational cost. It also discusses challenges and future directions.

Significance. If the benchmarking results hold and generalize, the survey would offer practical guidance for selecting positional encodings in time series transformers by clarifying performance-complexity trade-offs and data-dependent behavior. The quantitative component distinguishes it from purely qualitative surveys, but its value hinges on the diversity and representativeness of the evaluated tasks.

major comments (2)
  1. [Benchmarking section] Benchmarking section: The central empirical claim that sequence length, signal complexity, and dimensionality 'significantly influence' positional encoding effectiveness rests on the chosen classification tasks. The manuscript must explicitly list all datasets used, report their key statistics (length ranges, dimensionality, complexity measures), and demonstrate that they span sufficiently diverse regimes; otherwise the generalization to 'data characteristics' cannot be supported and the performance-vs-complexity trade-off remains conditional on the sampled distribution.
  2. [Results and discussion] Results and discussion: The abstract states that advanced methods exhibit 'performance gains in terms of prediction accuracy' yet incur higher complexity. The paper should provide concrete quantitative evidence (e.g., accuracy deltas and runtime/memory measurements) for each method across the datasets, including statistical significance tests and controls for confounding factors such as model size or training protocol, to substantiate the trade-off claim.
minor comments (2)
  1. [Abstract and Introduction] The abstract and introduction should clarify the exact scope (e.g., whether only classification or also forecasting/anomaly detection) and the criteria used to select the reviewed methods and datasets.
  2. [Review of Methods] Notation for the different encoding families (fixed, learnable, relative, hybrid) should be standardized and introduced early to improve readability when comparing methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below and will incorporate the requested details into the revised manuscript.

read point-by-point responses
  1. Referee: [Benchmarking section] Benchmarking section: The central empirical claim that sequence length, signal complexity, and dimensionality 'significantly influence' positional encoding effectiveness rests on the chosen classification tasks. The manuscript must explicitly list all datasets used, report their key statistics (length ranges, dimensionality, complexity measures), and demonstrate that they span sufficiently diverse regimes; otherwise the generalization to 'data characteristics' cannot be supported and the performance-vs-complexity trade-off remains conditional on the sampled distribution.

    Authors: We agree that the current presentation of the benchmarking datasets is insufficient to fully support the generalization claims. In the revision we will add an explicit table in the Benchmarking section that enumerates every dataset together with its sequence-length range, dimensionality, and at least one quantitative complexity measure (e.g., signal entropy or dominant frequency). We will also insert a short paragraph discussing how the chosen collection covers short/long, low/high-dimensional, and simple/complex regimes. revision: yes

  2. Referee: [Results and discussion] Results and discussion: The abstract states that advanced methods exhibit 'performance gains in terms of prediction accuracy' yet incur higher complexity. The paper should provide concrete quantitative evidence (e.g., accuracy deltas and runtime/memory measurements) for each method across the datasets, including statistical significance tests and controls for confounding factors such as model size or training protocol, to substantiate the trade-off claim.

    Authors: We accept that the present results section lacks the granularity needed to substantiate the stated trade-offs. The revised manuscript will include expanded tables reporting per-dataset accuracy, accuracy deltas relative to a fixed baseline, wall-clock runtime, and peak memory for every positional-encoding variant. We will add paired statistical significance tests (Wilcoxon signed-rank) and will explicitly state that all experiments used identical model sizes and training protocols so that differences can be attributed to the encoding method. revision: yes

Circularity Check

0 steps flagged

No circularity: survey contains no derivations, equations, or fitted predictions

full rationale

This is a survey paper reviewing existing positional encoding techniques for transformer-based time series models, with some quantitative benchmarking on classification tasks. No mathematical derivations, first-principles results, or 'predictions' appear that could reduce to inputs by construction. The abstract and structure indicate empirical observations about data characteristics influencing effectiveness, but these rest on external benchmarking rather than self-referential fitting or self-citation chains. No self-definitional steps, fitted inputs called predictions, or ansatz smuggling are present. The reader's assessment of score 0.0 is consistent with the absence of any load-bearing derivation chain. Representativeness concerns are evidential, not circularity issues.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no free parameters, axioms, or invented entities; it reviews prior literature.

pith-pipeline@v0.9.0 · 5693 in / 963 out tokens · 36571 ms · 2026-05-23T02:22:29.862149+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers

    cs.LG 2025-09 unverdicted novelty 6.0

    DyWPE generates positional embeddings for time series transformers from the input signal via Discrete Wavelet Transform and outperforms standard positional encodings on ten datasets, especially longer sequences and bi...

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    John Wiley & Sons, ??? (2015)

    Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecast- ing and Control. John Wiley & Sons, ??? (2015)

  2. [2]

    OTexts, ??? (2018)

    Hyndman, R.J., Athanasopoulos, G.: Fore- casting: Principles and Practice. OTexts, ??? (2018)

  3. [3]

    Neural computation9(8), 1735–1780 (1997)

    Hochreiter, S., Schmidhuber, J.: Long short- term memory. Neural computation9(8), 1735–1780 (1997)

  4. [4]

    In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelli- gence, pp

    Yamak, P.T., Yujian, L., Gadosey, P.K.: A comparison between arima, lstm, and gru for time series forecasting. In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelli- gence, pp. 49–55 (2019)

  5. [5]

    In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Song, H., Rajan, D., Thiagarajan, J.J., Spanias, A.: Attend and diagnose: Clinical time series analysis using attention models. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018). AAAI Press

  6. [6]

    16 The handbook of brain theory and neural networks3361(10), 1995 (1995)

    LeCun, Y., Bengio, Y.,et al.: Convolutional networks for images, speech, and time series. 16 The handbook of brain theory and neural networks3361(10), 1995 (1995)

  7. [7]

    In: Interna- tional Joint Conference on Neural Networks (IJCNN), pp

    Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neu- ral networks: A strong baseline. In: Interna- tional Joint Conference on Neural Networks (IJCNN), pp. 1578–1585 (2017). IEEE

  8. [8]

    In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) (2019)

    Shih, S.-C., Sun, F.-K., Lee, H.-Y.: Tempo- ral pattern attention for multivariate time series forecasting. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) (2019)

  9. [9]

    Journal of systems engineering and electronics28(1), 162–169 (2017)

    Zhao, B., Lu, H., Chen, S., Liu, J., Wu, D.: Convolutional neural networks for time series classification. Journal of systems engineering and electronics28(1), 162–169 (2017)

  10. [10]

    Advances in neural information processing systems30(2017)

    Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30(2017)

  11. [11]

    In: Advances in Neural Information Processing Systems, pp

    Li, H., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.X., Yan, X.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In: Advances in Neural Information Processing Systems, pp. 5243–5253 (2019)

  12. [12]

    Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and Short Papers), pp. 4171–4186 (2019)

  13. [13]

    Advances in Neural Information Processing Systems33, 1877–1901 (2020)

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A.,et al.: Language models are few-shot learners. Advances in Neural Information Processing Systems33, 1877–1901 (2020)

  14. [14]

    Zeng, A., Fu, Y., Shang, C., Cheng, J.: Are transformers effective for time series forecast- ing? arXiv preprint arXiv:2303.16640 (2023)

  15. [15]

    International Journal of Machine Learning and Cybernetics (2023)

    Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. International Journal of Machine Learning and Cybernetics (2023)

  16. [16]

    Big Data9(1), 3–21 (2021)

    Torres, J.F., Hadjout, D., Sebaa, A., Mart´ ınez-´Alvarez, F., Troncoso, A.: Deep learning for time series forecasting: A survey. Big Data9(1), 3–21 (2021)

  17. [17]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time- series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11106–11115 (2021)

  18. [18]

    Advances in Neural Information Process- ing Systems34, 22419–22430 (2021)

    Wu, H., Xu, J., Wang, J., Long, M.: Aut- oformer: Decomposition transformers with auto-correlation for long-term series forecast- ing. Advances in Neural Information Process- ing Systems34, 22419–22430 (2021)

  19. [19]

    In: International Conference on Learning Representations (2022)

    Liu, S., Yu, H., Liao, C., Li, J., Lin, W., Liu, A.X., Dustdar, S.: Pyraformer: Low- complexity pyramidal attention for long- range time series modeling and forecasting. In: International Conference on Learning Representations (2022)

  20. [20]

    In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp

    Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., Eickhoff, C.: A transformer- based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2114–2124 (2021)

  21. [21]

    Data Mining and Knowledge Dis- covery38(1), 22–48 (2024)

    Foumani, N.M., Tan, C.W., Webb, G.I., Salehi, M.: Improving position encoding of transformers for multivariate time series clas- sification. Data Mining and Knowledge Dis- covery38(1), 22–48 (2024)

  22. [22]

    In: Interna- tional Conference on Learning Representa- tions (2022) 17

    Xu, J., Wu, H., Wang, J., Long, M.: Anomaly transformer: Time series anomaly detection with association discrepancy. In: Interna- tional Conference on Learning Representa- tions (2022) 17

  23. [23]

    Proceedings of the VLDB Endowment15(6), 1201–1214 (2022)

    Tuli, S., Casale, G., Jennings, N.R.: Tranad: Deep transformer networks for anomaly detection in multivariate time series data. Proceedings of the VLDB Endowment15(6), 1201–1214 (2022)

  24. [24]

    Alpagasus: Training a better alpaca with fewer data.arXiv preprint arXiv:2307.08701,

    Li, S., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.-X., Yan, X.: Time series forecasting with transformers: A survey. arXiv preprint arXiv:2307.08701 (2023)

  25. [25]

    Pattern Recognition138, 109394 (2023)

    Liu, J., Chen, M., Wang, Y.: Dynamic posi- tional encoding for transformer-based time series analysis. Pattern Recognition138, 109394 (2023)

  26. [26]

    Transformers in time series: A survey,

    Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 (2022)

  27. [27]

    arXiv preprint arXiv:2312.17044 (2024)

    Zhao, L., Qi, Y., Zhang, S., Ma, Y., Liu, S., Zhou, T., et al.: Length extrapolation of transformers: A survey from the perspec- tive of positional encoding. arXiv preprint arXiv:2312.17044 (2024)

  28. [28]

    arXiv preprint arXiv:2305.19466 (2023)

    Kazemnejad, A., Padhi, I., Rish, I., Reddy, S., Cheung, J.C.K.: The impact of positional encoding on length generalization in trans- formers. arXiv preprint arXiv:2305.19466 (2023)

  29. [29]

    arXiv preprint arXiv:2404.10337 (2024)

    Zhang, J., Wang, J., Qiang, W., Xu, F., Zheng, C., Sun, F., Xiong, H.: Intrigu- ing properties of positional encoding in time series forecasting. arXiv preprint arXiv:2404.10337 (2024)

  30. [30]

    ACM Computing Surveys55(6), 1–28 (2022)

    Tay, Y., Dehghani, M., Bahri, D., Metzler, D.: Efficient transformers: A survey. ACM Computing Surveys55(6), 1–28 (2022)

  31. [31]

    arXiv preprint arXiv:2006.15595 (2020)

    Ke, G., He, D., Liu, T.-Y.: Rethinking posi- tional encoding in language pre-training. arXiv preprint arXiv:2006.15595 (2020)

  32. [32]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Bahdanau, D., Cho, K., Bengio, Y.: Neu- ral machine translation by jointly learn- ing to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  33. [33]

    Effective Approaches to Attention-based Neural Machine Translation

    Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neu- ral machine translation. arXiv preprint arXiv:1508.04025 (2015)

  34. [34]

    A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

    Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.: A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017)

  35. [35]

    Knowledge- Based Systems191, 105239 (2020)

    Fan, C., Zhang, Y., Pan, Y., Li, X.: A dual attention-based coupling network for multi- variate time series forecasting. Knowledge- Based Systems191, 105239 (2020)

  36. [36]

    In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

    Song, H., Rajan, D., Thiagarajan, J.J., Spanias, A.: Attend and diagnose: Clinical time series analysis using attention mod- els. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

  37. [37]

    Advances in neural information processing systems28(2015)

    Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based mod- els for speech recognition. Advances in neural information processing systems28(2015)

  38. [38]

    In: Pro- ceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (2020)

    Hao, Y., Cao, H.: A new attention mechanism to classify multivariate time series. In: Pro- ceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (2020)

  39. [39]

    arXiv preprint arXiv:2103.14438 (2021)

    Liu, M., Ren, S., Ma, S., Jiao, J., Chen, Y., Wang, Z., Song, W.: Gated transformer networks for multivariate time series clas- sification. arXiv preprint arXiv:2103.14438 (2021)

  40. [40]

    Longformer: The Long-Document Transformer

    Beltagy, I., Peters, M.E., Cohan, A.: Long- former: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

  41. [41]

    Reformer: The Efficient Transformer

    Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)

  42. [42]

    IEEE Transactions on Neural Net- works and Learning Systems32(8), 3425– 3437 (2021) 18

    Chu, X., Yang, W., Zhang, L.: Contextual position encoding for time series classifi- cation. IEEE Transactions on Neural Net- works and Learning Systems32(8), 3425– 3437 (2021) 18

  43. [43]

    Applied Intelligence53(12), 15234–15251 (2023)

    Anderson, L., Thompson, M., White, J.: Embedding strategies for temporal data in deep learning. Applied Intelligence53(12), 15234–15251 (2023)

  44. [44]

    Machine Learning 111(8), 2943–2967 (2022)

    Taylor, M., Davis, S., Wilson, J.: Transformer architectures for time series: Positional encoding considerations. Machine Learning 111(8), 2943–2967 (2022)

  45. [45]

    ¨O., Loeff, N., Pfister, T.: Temporal fusion transformers for inter- pretable multi-horizon time series forecast- ing

    Lim, B., Arık, S. ¨O., Loeff, N., Pfister, T.: Temporal fusion transformers for inter- pretable multi-horizon time series forecast- ing. International Journal of Forecasting 37(4), 1748–1764 (2021)

  46. [46]

    Information Sciences625, 789–805 (2023)

    Zhang, L., Wang, Q., Chen, R.: Hierarchical positional encoding for multi-scale time series analysis. Information Sciences625, 789–805 (2023)

  47. [47]

    Machine Learning112(4), 1345–1372 (2023)

    Schmidt, H., Mueller, A., Weber, K.: Multi- scale positional encoding for hierarchical time series analysis. Machine Learning112(4), 1345–1372 (2023)

  48. [48]

    In: Advances in Neural Information Processing Systems, vol

    Kumar, R., Sharma, P., Gupta, A.: Posi- tional encoding methods for long-term time series prediction. In: Advances in Neural Information Processing Systems, vol. 36, pp. 28456–28471 (2023)

  49. [49]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Cordonnier, J.-B., Mahendran, A., Doso- vitskiy, A., Weissenborn, D., Uszkoreit, J., Unterthiner, T.: Differentiable patch selec- tion for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2351– 2360 (2021)

  50. [50]

    arXiv preprint arXiv:2009.13658 (2020)

    Huang, Z., Liang, D., Xu, P., Xiang, B.: Improve transformer models with better rel- ative position embeddings. arXiv preprint arXiv:2009.13658 (2020)

  51. [51]

    International Conference on Learning Repre- sentations (ICLR) (2021)

    Ke, G., He, D., Liu, T.-Y.: Rethinking posi- tional encoding in language pre-training. International Conference on Learning Repre- sentations (ICLR) (2021)

  52. [52]

    Self-Attention with Relative Position Representations

    Shaw, P., Uszkoreit, J., Vaswani, A.: Self- attention with relative position represen- tations. arXiv preprint arXiv:1803.02155 (2018)

  53. [53]

    In: International Conference on Machine Learning, pp

    Liutkus, A., C ¸ ıfka, O., Wu, S.-L., Simsekli, U., Yang, Y.-H., Richard, G.: Relative posi- tional encoding for transformers with lin- ear complexity. In: International Conference on Machine Learning, pp. 7067–7079 (2021). PMLR

  54. [54]

    The Journal of Supercomput- ing81(1), 282 (2025)

    Alioghli, A.A., Yıldırım Okay, F.: Enhanc- ing multivariate time-series anomaly detec- tion with positional encoding mechanisms in transformers. The Journal of Supercomput- ing81(1), 282 (2025)

  55. [55]

    International Journal of Forecasting39(3), 1234–1248 (2023)

    Martinez, C., Garcia, A., Lopez, J.: Posi- tional encoding strategies for multivariate time series forecasting. International Journal of Forecasting39(3), 1234–1248 (2023)

  56. [56]

    Neural Computing and Applications 35(12), 8765–8778 (2023)

    Wang, H., Li, J., Zhang, X.: Frequency-based positional encoding for time series trans- formers. Neural Computing and Applications 35(12), 8765–8778 (2023)

  57. [57]

    Digital Signal Processing134, 103921 (2023)

    Nguyen, L., Tran, D., Pham, M.: Spec- tral analysis of positional encodings in time series models. Digital Signal Processing134, 103921 (2023)

  58. [58]

    Expert Systems with Applications213, 118912 (2023)

    Li, X., Yang, J., Zhou, F.: Learnable posi- tional encoding for time series transform- ers. Expert Systems with Applications213, 118912 (2023)

  59. [59]

    In: International Conference on Machine Learning, pp

    Foster, J., Adams, R., Collins, S.: Learn- ing optimal positional encodings for time series tasks. In: International Conference on Machine Learning, pp. 10123–10138 (2023)

  60. [60]

    Data Mining and Knowledge Discovery37(3), 1089–1115 (2023)

    Miller, S., Green, L., Hall, R.: Hybrid posi- tional encoding approaches for time series transformers. Data Mining and Knowledge Discovery37(3), 1089–1115 (2023)

  61. [61]

    Expert Systems with Applications 220, 119678 (2023) 19

    Bell, A., Gray, M., King, R.: Adaptive posi- tional encoding mechanisms for dynamic time series. Expert Systems with Applications 220, 119678 (2023) 19

  62. [62]

    Knowledge-Based Systems272, 110567 (2023)

    Campbell, D., Reed, E., Bailey, M.: Trans- former models for irregular time series: Posi- tion encoding challenges. Knowledge-Based Systems272, 110567 (2023)

  63. [63]

    Pat- tern Recognition Letters156, 87–94 (2022)

    Lee, C., Kim, M., Park, S.: Adaptive position encoding for variable-length time series. Pat- tern Recognition Letters156, 87–94 (2022)

  64. [64]

    In: AAAI Conference on Artificial Intelligence, vol

    Patel, R., Kumar, A., Singh, N.: Efficient positional encoding for long time series. In: AAAI Conference on Artificial Intelligence, vol. 37, pp. 9456–9464 (2023)

  65. [65]

    Knowledge-Based Systems268, 110456 (2023)

    Chen, Y., Zhou, H., Liu, S.: Adaptive posi- tional encoding for long sequence time series forecasting. Knowledge-Based Systems268, 110456 (2023)

  66. [66]

    Neurocom- puting568, 127063 (2024)

    Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced transformer with rotary position embedding. Neurocom- puting568, 127063 (2024)

  67. [67]

    Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

    Press, O., Smith, N.A., Lewis, M.: Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409 (2021)

  68. [68]

    10994–11004 (2020)

    Yun, C., Bhojanapalli, S., Rawat, A.S., Reddi, S.J., Kumar, S.: Are transformers uni- versal approximators of sequence-to-sequence functions? In: International Conference on Machine Learning, pp. 10994–11004 (2020). PMLR

  69. [69]

    Tsai, Y.-H.H., Bai, S., Yamada, M., Morency, L.-P., Salakhutdinov, R.: Transformer dis- section: An unified understanding for trans- former’s attention via the lens of kernel. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), ...

  70. [70]

    In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Ana- lyzing and Interpreting Neural Networks for NLP, pp

    Clark, K., Khandelwal, U., Levy, O., Man- ning, C.D.: What does bert look at? an analysis of bert’s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Ana- lyzing and Interpreting Neural Networks for NLP, pp. 276–286 (2019)

  71. [71]

    IEEE Transactions on Signal Processing 71, 1892–1904 (2023)

    Sun, W., Chen, L., Zhao, M.: Convolutional positional encoding for time series transform- ers. IEEE Transactions on Signal Processing 71, 1892–1904 (2023)

  72. [72]

    Signal Processing205, 108871 (2023)

    Kim, H., Park, J., Lee, S.: Temporal posi- tional encoding with attention mechanisms. Signal Processing205, 108871 (2023)

  73. [73]

    Artificial Intelligence Review57(8), 6789–6821 (2022)

    Peters, F., Hoffman, C., Young, T.: Deep learning approaches to time series with positional awareness. Artificial Intelligence Review57(8), 6789–6821 (2022)

  74. [74]

    International Journal of Machine Learn- ing and Cybernetics14(6), 2134–2149 (2023)

    Thompson, S., Harris, B., Turner, M.: Cyber- netic principles in modern time series analy- sis. International Journal of Machine Learn- ing and Cybernetics14(6), 2134–2149 (2023)

  75. [75]

    Journal of Machine Learning Research21(140), 1–67 (2020)

    Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research21(140), 1–67 (2020)

  76. [76]

    Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.-C.M., Zhu, Y., Gharghabi, S., Ratanama- hatana, C.A., Yanping, C., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR Time Series Classification Archive (2018)

  77. [77]

    Applied Sciences7(10), 1101 (2017)

    Micucci, D., Mobilio, M., Napoletano, P.: Unimib shar: A dataset for human activ- ity recognition using acceleration data from smartphones. Applied Sciences7(10), 1101 (2017)

  78. [78]

    UCI Machine Learning Repository

    N, K., I, K., Makarov, K.V., S, L.: EMG Data for Gestures. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5ZP5C (2018)

  79. [79]

    UCI Machine Learning Reposi- tory

    Singh, A.P., Chaudhari, S.: Room Occupancy Estimation. UCI Machine Learning Reposi- tory. DOI: https://doi.org/10.24432/C5P605 (2018)

  80. [80]

    Data Mining and Knowledge Discovery31(3), 606– 660 (2017)

    Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification 20 bake off: A review and experimental evalu- ation of recent algorithmic advances. Data Mining and Knowledge Discovery31(3), 606– 660 (2017)

Showing first 80 references.