Positional Encoding in Transformer-Based Time Series Models: A Survey

Habib Irani; Vangelis Metsis

arxiv: 2502.12370 · v3 · submitted 2025-02-17 · 💻 cs.LG

Positional Encoding in Transformer-Based Time Series Models: A Survey

Habib Irani , Vangelis Metsis This is my paper

Pith reviewed 2026-05-23 02:22 UTC · model grok-4.3

classification 💻 cs.LG

keywords positional encodingtransformertime seriesclassificationsurveyforecastinganomaly detection

0 comments

The pith

Data characteristics like sequence length and complexity determine which positional encoding performs best in transformer time series models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Transformer models for time series data require positional encodings to respect sequence order. This survey reviews fixed, learnable, relative, and hybrid encoding approaches and benchmarks them on classification tasks. It concludes that sequence length, signal complexity, and dimensionality strongly shape which method works well. Advanced encodings deliver accuracy gains but raise computational demands. The work also flags open challenges and research directions for improved encodings.

Core claim

The survey establishes that in transformer-based time series models, the effectiveness of different positional encoding approaches—fixed, learnable, relative, and hybrid—varies significantly based on data characteristics including sequence length, signal complexity, and dimensionality, with advanced methods providing accuracy improvements but incurring greater computational demands.

What carries the argument

Categorization of positional encoding methods (fixed, learnable, relative, hybrid) and their quantitative evaluation on time series classification tasks.

If this is right

Encoding selection must account for the specific sequence length and signal traits of the target data.
Advanced encodings improve prediction accuracy on classification tasks.
Gains in accuracy from advanced methods come with higher computational cost.
Challenges remain in balancing performance and efficiency for varied time series data.
Future work should target encodings that adapt to data characteristics without added complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trade-offs likely extend to forecasting and anomaly detection tasks even though the benchmarks focused on classification.
Hybrid encodings may offer a practical middle ground for very long sequences where pure advanced methods become too expensive.
Practitioners could build simple decision rules based on data statistics to pick encodings before training.

Load-bearing premise

The chosen time series classification tasks and benchmarks are representative enough to support general claims about method effectiveness.

What would settle it

A new set of benchmarks on diverse time series datasets that shows no measurable dependence of encoding performance on sequence length, complexity, or dimensionality.

read the original abstract

Recent advancements in transformer-based models have greatly improved time series analysis, providing robust solutions for tasks such as forecasting, anomaly detection, and classification. A crucial element of these models is positional encoding, which allows transformers to capture the intrinsic sequential nature of time series data. This survey systematically examines existing techniques for positional encoding in transformer-based time series models. We investigate a variety of methods, including fixed, learnable, relative, and hybrid approaches, and evaluate their effectiveness in different time series classification tasks. Our findings indicate that data characteristics like sequence length, signal complexity, and dimensionality significantly influence method effectiveness. Advanced positional encoding methods exhibit performance gains in terms of prediction accuracy, however, they come at the cost of increased computational complexity. Furthermore, we outline key challenges and suggest potential research directions to enhance positional encoding strategies. By delivering a comprehensive overview and quantitative benchmarking, this survey intends to assist researchers and practitioners in selecting and designing effective positional encoding methods for transformer-based time series models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A straightforward survey that organizes positional encoding methods and runs limited benchmarks, but the general claims about data characteristics rest on unexamined task selection.

read the letter

This survey gathers existing positional encoding approaches for transformer time series models and benchmarks them on classification tasks. It groups methods into fixed, learnable, relative, and hybrid types and reports that sequence length, complexity, and dimensionality affect which ones perform better, with advanced options buying accuracy at higher compute cost. That organization is the main service it provides; someone entering the area can use it as a map without reading every original paper.

Referee Report

2 major / 2 minor

Summary. The paper surveys positional encoding methods (fixed, learnable, relative, and hybrid) for transformer-based time series models. It reviews existing techniques, performs quantitative benchmarking on selected time series classification tasks, and reports that data characteristics such as sequence length, signal complexity, and dimensionality significantly affect method performance, with advanced encodings improving accuracy at the expense of computational cost. It also discusses challenges and future directions.

Significance. If the benchmarking results hold and generalize, the survey would offer practical guidance for selecting positional encodings in time series transformers by clarifying performance-complexity trade-offs and data-dependent behavior. The quantitative component distinguishes it from purely qualitative surveys, but its value hinges on the diversity and representativeness of the evaluated tasks.

major comments (2)

[Benchmarking section] Benchmarking section: The central empirical claim that sequence length, signal complexity, and dimensionality 'significantly influence' positional encoding effectiveness rests on the chosen classification tasks. The manuscript must explicitly list all datasets used, report their key statistics (length ranges, dimensionality, complexity measures), and demonstrate that they span sufficiently diverse regimes; otherwise the generalization to 'data characteristics' cannot be supported and the performance-vs-complexity trade-off remains conditional on the sampled distribution.
[Results and discussion] Results and discussion: The abstract states that advanced methods exhibit 'performance gains in terms of prediction accuracy' yet incur higher complexity. The paper should provide concrete quantitative evidence (e.g., accuracy deltas and runtime/memory measurements) for each method across the datasets, including statistical significance tests and controls for confounding factors such as model size or training protocol, to substantiate the trade-off claim.

minor comments (2)

[Abstract and Introduction] The abstract and introduction should clarify the exact scope (e.g., whether only classification or also forecasting/anomaly detection) and the criteria used to select the reviewed methods and datasets.
[Review of Methods] Notation for the different encoding families (fixed, learnable, relative, hybrid) should be standardized and introduced early to improve readability when comparing methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below and will incorporate the requested details into the revised manuscript.

read point-by-point responses

Referee: [Benchmarking section] Benchmarking section: The central empirical claim that sequence length, signal complexity, and dimensionality 'significantly influence' positional encoding effectiveness rests on the chosen classification tasks. The manuscript must explicitly list all datasets used, report their key statistics (length ranges, dimensionality, complexity measures), and demonstrate that they span sufficiently diverse regimes; otherwise the generalization to 'data characteristics' cannot be supported and the performance-vs-complexity trade-off remains conditional on the sampled distribution.

Authors: We agree that the current presentation of the benchmarking datasets is insufficient to fully support the generalization claims. In the revision we will add an explicit table in the Benchmarking section that enumerates every dataset together with its sequence-length range, dimensionality, and at least one quantitative complexity measure (e.g., signal entropy or dominant frequency). We will also insert a short paragraph discussing how the chosen collection covers short/long, low/high-dimensional, and simple/complex regimes. revision: yes
Referee: [Results and discussion] Results and discussion: The abstract states that advanced methods exhibit 'performance gains in terms of prediction accuracy' yet incur higher complexity. The paper should provide concrete quantitative evidence (e.g., accuracy deltas and runtime/memory measurements) for each method across the datasets, including statistical significance tests and controls for confounding factors such as model size or training protocol, to substantiate the trade-off claim.

Authors: We accept that the present results section lacks the granularity needed to substantiate the stated trade-offs. The revised manuscript will include expanded tables reporting per-dataset accuracy, accuracy deltas relative to a fixed baseline, wall-clock runtime, and peak memory for every positional-encoding variant. We will add paired statistical significance tests (Wilcoxon signed-rank) and will explicitly state that all experiments used identical model sizes and training protocols so that differences can be attributed to the encoding method. revision: yes

Circularity Check

0 steps flagged

No circularity: survey contains no derivations, equations, or fitted predictions

full rationale

This is a survey paper reviewing existing positional encoding techniques for transformer-based time series models, with some quantitative benchmarking on classification tasks. No mathematical derivations, first-principles results, or 'predictions' appear that could reduce to inputs by construction. The abstract and structure indicate empirical observations about data characteristics influencing effectiveness, but these rest on external benchmarking rather than self-referential fitting or self-citation chains. No self-definitional steps, fitted inputs called predictions, or ansatz smuggling are present. The reader's assessment of score 0.0 is consistent with the absence of any load-bearing derivation chain. Representativeness concerns are evidential, not circularity issues.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no free parameters, axioms, or invented entities; it reviews prior literature.

pith-pipeline@v0.9.0 · 5693 in / 963 out tokens · 36571 ms · 2026-05-23T02:22:29.862149+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
cs.LG 2025-09 unverdicted novelty 6.0

DyWPE generates positional embeddings for time series transformers from the input signal via Discrete Wavelet Transform and outperforms standard positional encodings on ten datasets, especially longer sequences and bi...

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

John Wiley & Sons, ??? (2015)

Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecast- ing and Control. John Wiley & Sons, ??? (2015)

work page 2015
[2]

OTexts, ??? (2018)

Hyndman, R.J., Athanasopoulos, G.: Fore- casting: Principles and Practice. OTexts, ??? (2018)

work page 2018
[3]

Neural computation9(8), 1735–1780 (1997)

Hochreiter, S., Schmidhuber, J.: Long short- term memory. Neural computation9(8), 1735–1780 (1997)

work page 1997
[4]

In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelli- gence, pp

Yamak, P.T., Yujian, L., Gadosey, P.K.: A comparison between arima, lstm, and gru for time series forecasting. In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelli- gence, pp. 49–55 (2019)

work page 2019
[5]

In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)

Song, H., Rajan, D., Thiagarajan, J.J., Spanias, A.: Attend and diagnose: Clinical time series analysis using attention models. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018). AAAI Press

work page 2018
[6]

16 The handbook of brain theory and neural networks3361(10), 1995 (1995)

LeCun, Y., Bengio, Y.,et al.: Convolutional networks for images, speech, and time series. 16 The handbook of brain theory and neural networks3361(10), 1995 (1995)

work page 1995
[7]

In: Interna- tional Joint Conference on Neural Networks (IJCNN), pp

Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neu- ral networks: A strong baseline. In: Interna- tional Joint Conference on Neural Networks (IJCNN), pp. 1578–1585 (2017). IEEE

work page 2017
[8]

In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) (2019)

Shih, S.-C., Sun, F.-K., Lee, H.-Y.: Tempo- ral pattern attention for multivariate time series forecasting. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) (2019)

work page 2019
[9]

Journal of systems engineering and electronics28(1), 162–169 (2017)

Zhao, B., Lu, H., Chen, S., Liu, J., Wu, D.: Convolutional neural networks for time series classification. Journal of systems engineering and electronics28(1), 162–169 (2017)

work page 2017
[10]

Advances in neural information processing systems30(2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30(2017)

work page 2017
[11]

In: Advances in Neural Information Processing Systems, pp

Li, H., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.X., Yan, X.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In: Advances in Neural Information Processing Systems, pp. 5243–5253 (2019)

work page 2019
[12]

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and Short Papers), pp. 4171–4186 (2019)

work page 2019
[13]

Advances in Neural Information Processing Systems33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A.,et al.: Language models are few-shot learners. Advances in Neural Information Processing Systems33, 1877–1901 (2020)

work page 1901
[14]

Zeng, A., Fu, Y., Shang, C., Cheng, J.: Are transformers effective for time series forecast- ing? arXiv preprint arXiv:2303.16640 (2023)

work page arXiv 2023
[15]

International Journal of Machine Learning and Cybernetics (2023)

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. International Journal of Machine Learning and Cybernetics (2023)

work page 2023
[16]

Big Data9(1), 3–21 (2021)

Torres, J.F., Hadjout, D., Sebaa, A., Mart´ ınez-´Alvarez, F., Troncoso, A.: Deep learning for time series forecasting: A survey. Big Data9(1), 3–21 (2021)

work page 2021
[17]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time- series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11106–11115 (2021)

work page 2021
[18]

Advances in Neural Information Process- ing Systems34, 22419–22430 (2021)

Wu, H., Xu, J., Wang, J., Long, M.: Aut- oformer: Decomposition transformers with auto-correlation for long-term series forecast- ing. Advances in Neural Information Process- ing Systems34, 22419–22430 (2021)

work page 2021
[19]

In: International Conference on Learning Representations (2022)

Liu, S., Yu, H., Liao, C., Li, J., Lin, W., Liu, A.X., Dustdar, S.: Pyraformer: Low- complexity pyramidal attention for long- range time series modeling and forecasting. In: International Conference on Learning Representations (2022)

work page 2022
[20]

In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp

Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., Eickhoff, C.: A transformer- based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2114–2124 (2021)

work page 2021
[21]

Data Mining and Knowledge Dis- covery38(1), 22–48 (2024)

Foumani, N.M., Tan, C.W., Webb, G.I., Salehi, M.: Improving position encoding of transformers for multivariate time series clas- sification. Data Mining and Knowledge Dis- covery38(1), 22–48 (2024)

work page 2024
[22]

In: Interna- tional Conference on Learning Representa- tions (2022) 17

Xu, J., Wu, H., Wang, J., Long, M.: Anomaly transformer: Time series anomaly detection with association discrepancy. In: Interna- tional Conference on Learning Representa- tions (2022) 17

work page 2022
[23]

Proceedings of the VLDB Endowment15(6), 1201–1214 (2022)

Tuli, S., Casale, G., Jennings, N.R.: Tranad: Deep transformer networks for anomaly detection in multivariate time series data. Proceedings of the VLDB Endowment15(6), 1201–1214 (2022)

work page 2022
[24]

Alpagasus: Training a better alpaca with fewer data.arXiv preprint arXiv:2307.08701,

Li, S., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.-X., Yan, X.: Time series forecasting with transformers: A survey. arXiv preprint arXiv:2307.08701 (2023)

work page arXiv 2023
[25]

Pattern Recognition138, 109394 (2023)

Liu, J., Chen, M., Wang, Y.: Dynamic posi- tional encoding for transformer-based time series analysis. Pattern Recognition138, 109394 (2023)

work page 2023
[26]

Transformers in time series: A survey,

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 (2022)

work page arXiv 2022
[27]

arXiv preprint arXiv:2312.17044 (2024)

Zhao, L., Qi, Y., Zhang, S., Ma, Y., Liu, S., Zhou, T., et al.: Length extrapolation of transformers: A survey from the perspec- tive of positional encoding. arXiv preprint arXiv:2312.17044 (2024)

work page arXiv 2024
[28]

arXiv preprint arXiv:2305.19466 (2023)

Kazemnejad, A., Padhi, I., Rish, I., Reddy, S., Cheung, J.C.K.: The impact of positional encoding on length generalization in trans- formers. arXiv preprint arXiv:2305.19466 (2023)

work page arXiv 2023
[29]

arXiv preprint arXiv:2404.10337 (2024)

Zhang, J., Wang, J., Qiang, W., Xu, F., Zheng, C., Sun, F., Xiong, H.: Intrigu- ing properties of positional encoding in time series forecasting. arXiv preprint arXiv:2404.10337 (2024)

work page arXiv 2024
[30]

ACM Computing Surveys55(6), 1–28 (2022)

Tay, Y., Dehghani, M., Bahri, D., Metzler, D.: Efficient transformers: A survey. ACM Computing Surveys55(6), 1–28 (2022)

work page 2022
[31]

arXiv preprint arXiv:2006.15595 (2020)

Ke, G., He, D., Liu, T.-Y.: Rethinking posi- tional encoding in language pre-training. arXiv preprint arXiv:2006.15595 (2020)

work page arXiv 2006
[32]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y.: Neu- ral machine translation by jointly learn- ing to align and translate. arXiv preprint arXiv:1409.0473 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[33]

Effective Approaches to Attention-based Neural Machine Translation

Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neu- ral machine translation. arXiv preprint arXiv:1508.04025 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[34]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.: A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

Knowledge- Based Systems191, 105239 (2020)

Fan, C., Zhang, Y., Pan, Y., Li, X.: A dual attention-based coupling network for multi- variate time series forecasting. Knowledge- Based Systems191, 105239 (2020)

work page 2020
[36]

In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

Song, H., Rajan, D., Thiagarajan, J.J., Spanias, A.: Attend and diagnose: Clinical time series analysis using attention mod- els. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

work page 2018
[37]

Advances in neural information processing systems28(2015)

Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based mod- els for speech recognition. Advances in neural information processing systems28(2015)

work page 2015
[38]

In: Pro- ceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (2020)

Hao, Y., Cao, H.: A new attention mechanism to classify multivariate time series. In: Pro- ceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (2020)

work page 2020
[39]

arXiv preprint arXiv:2103.14438 (2021)

Liu, M., Ren, S., Ma, S., Jiao, J., Chen, Y., Wang, Z., Song, W.: Gated transformer networks for multivariate time series clas- sification. arXiv preprint arXiv:2103.14438 (2021)

work page arXiv 2021
[40]

Longformer: The Long-Document Transformer

Beltagy, I., Peters, M.E., Cohan, A.: Long- former: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2004
[41]

Reformer: The Efficient Transformer

Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001
[42]

IEEE Transactions on Neural Net- works and Learning Systems32(8), 3425– 3437 (2021) 18

Chu, X., Yang, W., Zhang, L.: Contextual position encoding for time series classifi- cation. IEEE Transactions on Neural Net- works and Learning Systems32(8), 3425– 3437 (2021) 18

work page 2021
[43]

Applied Intelligence53(12), 15234–15251 (2023)

Anderson, L., Thompson, M., White, J.: Embedding strategies for temporal data in deep learning. Applied Intelligence53(12), 15234–15251 (2023)

work page 2023
[44]

Machine Learning 111(8), 2943–2967 (2022)

Taylor, M., Davis, S., Wilson, J.: Transformer architectures for time series: Positional encoding considerations. Machine Learning 111(8), 2943–2967 (2022)

work page 2022
[45]

¨O., Loeff, N., Pfister, T.: Temporal fusion transformers for inter- pretable multi-horizon time series forecast- ing

Lim, B., Arık, S. ¨O., Loeff, N., Pfister, T.: Temporal fusion transformers for inter- pretable multi-horizon time series forecast- ing. International Journal of Forecasting 37(4), 1748–1764 (2021)

work page 2021
[46]

Information Sciences625, 789–805 (2023)

Zhang, L., Wang, Q., Chen, R.: Hierarchical positional encoding for multi-scale time series analysis. Information Sciences625, 789–805 (2023)

work page 2023
[47]

Machine Learning112(4), 1345–1372 (2023)

Schmidt, H., Mueller, A., Weber, K.: Multi- scale positional encoding for hierarchical time series analysis. Machine Learning112(4), 1345–1372 (2023)

work page 2023
[48]

In: Advances in Neural Information Processing Systems, vol

Kumar, R., Sharma, P., Gupta, A.: Posi- tional encoding methods for long-term time series prediction. In: Advances in Neural Information Processing Systems, vol. 36, pp. 28456–28471 (2023)

work page 2023
[49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cordonnier, J.-B., Mahendran, A., Doso- vitskiy, A., Weissenborn, D., Uszkoreit, J., Unterthiner, T.: Differentiable patch selec- tion for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2351– 2360 (2021)

work page 2021
[50]

arXiv preprint arXiv:2009.13658 (2020)

Huang, Z., Liang, D., Xu, P., Xiang, B.: Improve transformer models with better rel- ative position embeddings. arXiv preprint arXiv:2009.13658 (2020)

work page arXiv 2009
[51]

International Conference on Learning Repre- sentations (ICLR) (2021)

Ke, G., He, D., Liu, T.-Y.: Rethinking posi- tional encoding in language pre-training. International Conference on Learning Repre- sentations (ICLR) (2021)

work page 2021
[52]

Self-Attention with Relative Position Representations

Shaw, P., Uszkoreit, J., Vaswani, A.: Self- attention with relative position represen- tations. arXiv preprint arXiv:1803.02155 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[53]

In: International Conference on Machine Learning, pp

Liutkus, A., C ¸ ıfka, O., Wu, S.-L., Simsekli, U., Yang, Y.-H., Richard, G.: Relative posi- tional encoding for transformers with lin- ear complexity. In: International Conference on Machine Learning, pp. 7067–7079 (2021). PMLR

work page 2021
[54]

The Journal of Supercomput- ing81(1), 282 (2025)

Alioghli, A.A., Yıldırım Okay, F.: Enhanc- ing multivariate time-series anomaly detec- tion with positional encoding mechanisms in transformers. The Journal of Supercomput- ing81(1), 282 (2025)

work page 2025
[55]

International Journal of Forecasting39(3), 1234–1248 (2023)

Martinez, C., Garcia, A., Lopez, J.: Posi- tional encoding strategies for multivariate time series forecasting. International Journal of Forecasting39(3), 1234–1248 (2023)

work page 2023
[56]

Neural Computing and Applications 35(12), 8765–8778 (2023)

Wang, H., Li, J., Zhang, X.: Frequency-based positional encoding for time series trans- formers. Neural Computing and Applications 35(12), 8765–8778 (2023)

work page 2023
[57]

Digital Signal Processing134, 103921 (2023)

Nguyen, L., Tran, D., Pham, M.: Spec- tral analysis of positional encodings in time series models. Digital Signal Processing134, 103921 (2023)

work page 2023
[58]

Expert Systems with Applications213, 118912 (2023)

Li, X., Yang, J., Zhou, F.: Learnable posi- tional encoding for time series transform- ers. Expert Systems with Applications213, 118912 (2023)

work page 2023
[59]

In: International Conference on Machine Learning, pp

Foster, J., Adams, R., Collins, S.: Learn- ing optimal positional encodings for time series tasks. In: International Conference on Machine Learning, pp. 10123–10138 (2023)

work page 2023
[60]

Data Mining and Knowledge Discovery37(3), 1089–1115 (2023)

Miller, S., Green, L., Hall, R.: Hybrid posi- tional encoding approaches for time series transformers. Data Mining and Knowledge Discovery37(3), 1089–1115 (2023)

work page 2023
[61]

Expert Systems with Applications 220, 119678 (2023) 19

Bell, A., Gray, M., King, R.: Adaptive posi- tional encoding mechanisms for dynamic time series. Expert Systems with Applications 220, 119678 (2023) 19

work page 2023
[62]

Knowledge-Based Systems272, 110567 (2023)

Campbell, D., Reed, E., Bailey, M.: Trans- former models for irregular time series: Posi- tion encoding challenges. Knowledge-Based Systems272, 110567 (2023)

work page 2023
[63]

Pat- tern Recognition Letters156, 87–94 (2022)

Lee, C., Kim, M., Park, S.: Adaptive position encoding for variable-length time series. Pat- tern Recognition Letters156, 87–94 (2022)

work page 2022
[64]

In: AAAI Conference on Artificial Intelligence, vol

Patel, R., Kumar, A., Singh, N.: Efficient positional encoding for long time series. In: AAAI Conference on Artificial Intelligence, vol. 37, pp. 9456–9464 (2023)

work page 2023
[65]

Knowledge-Based Systems268, 110456 (2023)

Chen, Y., Zhou, H., Liu, S.: Adaptive posi- tional encoding for long sequence time series forecasting. Knowledge-Based Systems268, 110456 (2023)

work page 2023
[66]

Neurocom- puting568, 127063 (2024)

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced transformer with rotary position embedding. Neurocom- puting568, 127063 (2024)

work page 2024
[67]

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Press, O., Smith, N.A., Lewis, M.: Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[68]

10994–11004 (2020)

Yun, C., Bhojanapalli, S., Rawat, A.S., Reddi, S.J., Kumar, S.: Are transformers uni- versal approximators of sequence-to-sequence functions? In: International Conference on Machine Learning, pp. 10994–11004 (2020). PMLR

work page 2020
[69]

Tsai, Y.-H.H., Bai, S., Yamada, M., Morency, L.-P., Salakhutdinov, R.: Transformer dis- section: An unified understanding for trans- former’s attention via the lens of kernel. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), ...

work page 2019
[70]

In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Ana- lyzing and Interpreting Neural Networks for NLP, pp

Clark, K., Khandelwal, U., Levy, O., Man- ning, C.D.: What does bert look at? an analysis of bert’s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Ana- lyzing and Interpreting Neural Networks for NLP, pp. 276–286 (2019)

work page 2019
[71]

IEEE Transactions on Signal Processing 71, 1892–1904 (2023)

Sun, W., Chen, L., Zhao, M.: Convolutional positional encoding for time series transform- ers. IEEE Transactions on Signal Processing 71, 1892–1904 (2023)

work page 1904
[72]

Signal Processing205, 108871 (2023)

Kim, H., Park, J., Lee, S.: Temporal posi- tional encoding with attention mechanisms. Signal Processing205, 108871 (2023)

work page 2023
[73]

Artificial Intelligence Review57(8), 6789–6821 (2022)

Peters, F., Hoffman, C., Young, T.: Deep learning approaches to time series with positional awareness. Artificial Intelligence Review57(8), 6789–6821 (2022)

work page 2022
[74]

International Journal of Machine Learn- ing and Cybernetics14(6), 2134–2149 (2023)

Thompson, S., Harris, B., Turner, M.: Cyber- netic principles in modern time series analy- sis. International Journal of Machine Learn- ing and Cybernetics14(6), 2134–2149 (2023)

work page 2023
[75]

Journal of Machine Learning Research21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research21(140), 1–67 (2020)

work page 2020
[76]

Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.-C.M., Zhu, Y., Gharghabi, S., Ratanama- hatana, C.A., Yanping, C., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR Time Series Classification Archive (2018)

work page 2018
[77]

Applied Sciences7(10), 1101 (2017)

Micucci, D., Mobilio, M., Napoletano, P.: Unimib shar: A dataset for human activ- ity recognition using acceleration data from smartphones. Applied Sciences7(10), 1101 (2017)

work page 2017
[78]

UCI Machine Learning Repository

N, K., I, K., Makarov, K.V., S, L.: EMG Data for Gestures. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5ZP5C (2018)

work page doi:10.24432/c5zp5c 2018
[79]

UCI Machine Learning Reposi- tory

Singh, A.P., Chaudhari, S.: Room Occupancy Estimation. UCI Machine Learning Reposi- tory. DOI: https://doi.org/10.24432/C5P605 (2018)

work page doi:10.24432/c5p605 2018
[80]

Data Mining and Knowledge Discovery31(3), 606– 660 (2017)

Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification 20 bake off: A review and experimental evalu- ation of recent algorithmic advances. Data Mining and Knowledge Discovery31(3), 606– 660 (2017)

work page 2017

Showing first 80 references.

[1] [1]

John Wiley & Sons, ??? (2015)

Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecast- ing and Control. John Wiley & Sons, ??? (2015)

work page 2015

[2] [2]

OTexts, ??? (2018)

Hyndman, R.J., Athanasopoulos, G.: Fore- casting: Principles and Practice. OTexts, ??? (2018)

work page 2018

[3] [3]

Neural computation9(8), 1735–1780 (1997)

Hochreiter, S., Schmidhuber, J.: Long short- term memory. Neural computation9(8), 1735–1780 (1997)

work page 1997

[4] [4]

In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelli- gence, pp

Yamak, P.T., Yujian, L., Gadosey, P.K.: A comparison between arima, lstm, and gru for time series forecasting. In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelli- gence, pp. 49–55 (2019)

work page 2019

[5] [5]

In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)

Song, H., Rajan, D., Thiagarajan, J.J., Spanias, A.: Attend and diagnose: Clinical time series analysis using attention models. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018). AAAI Press

work page 2018

[6] [6]

16 The handbook of brain theory and neural networks3361(10), 1995 (1995)

LeCun, Y., Bengio, Y.,et al.: Convolutional networks for images, speech, and time series. 16 The handbook of brain theory and neural networks3361(10), 1995 (1995)

work page 1995

[7] [7]

In: Interna- tional Joint Conference on Neural Networks (IJCNN), pp

Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neu- ral networks: A strong baseline. In: Interna- tional Joint Conference on Neural Networks (IJCNN), pp. 1578–1585 (2017). IEEE

work page 2017

[8] [8]

In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) (2019)

Shih, S.-C., Sun, F.-K., Lee, H.-Y.: Tempo- ral pattern attention for multivariate time series forecasting. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) (2019)

work page 2019

[9] [9]

Journal of systems engineering and electronics28(1), 162–169 (2017)

Zhao, B., Lu, H., Chen, S., Liu, J., Wu, D.: Convolutional neural networks for time series classification. Journal of systems engineering and electronics28(1), 162–169 (2017)

work page 2017

[10] [10]

Advances in neural information processing systems30(2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30(2017)

work page 2017

[11] [11]

In: Advances in Neural Information Processing Systems, pp

Li, H., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.X., Yan, X.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In: Advances in Neural Information Processing Systems, pp. 5243–5253 (2019)

work page 2019

[12] [12]

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and Short Papers), pp. 4171–4186 (2019)

work page 2019

[13] [13]

Advances in Neural Information Processing Systems33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A.,et al.: Language models are few-shot learners. Advances in Neural Information Processing Systems33, 1877–1901 (2020)

work page 1901

[14] [14]

Zeng, A., Fu, Y., Shang, C., Cheng, J.: Are transformers effective for time series forecast- ing? arXiv preprint arXiv:2303.16640 (2023)

work page arXiv 2023

[15] [15]

International Journal of Machine Learning and Cybernetics (2023)

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. International Journal of Machine Learning and Cybernetics (2023)

work page 2023

[16] [16]

Big Data9(1), 3–21 (2021)

Torres, J.F., Hadjout, D., Sebaa, A., Mart´ ınez-´Alvarez, F., Troncoso, A.: Deep learning for time series forecasting: A survey. Big Data9(1), 3–21 (2021)

work page 2021

[17] [17]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time- series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11106–11115 (2021)

work page 2021

[18] [18]

Advances in Neural Information Process- ing Systems34, 22419–22430 (2021)

Wu, H., Xu, J., Wang, J., Long, M.: Aut- oformer: Decomposition transformers with auto-correlation for long-term series forecast- ing. Advances in Neural Information Process- ing Systems34, 22419–22430 (2021)

work page 2021

[19] [19]

In: International Conference on Learning Representations (2022)

Liu, S., Yu, H., Liao, C., Li, J., Lin, W., Liu, A.X., Dustdar, S.: Pyraformer: Low- complexity pyramidal attention for long- range time series modeling and forecasting. In: International Conference on Learning Representations (2022)

work page 2022

[20] [20]

In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp

Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., Eickhoff, C.: A transformer- based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2114–2124 (2021)

work page 2021

[21] [21]

Data Mining and Knowledge Dis- covery38(1), 22–48 (2024)

Foumani, N.M., Tan, C.W., Webb, G.I., Salehi, M.: Improving position encoding of transformers for multivariate time series clas- sification. Data Mining and Knowledge Dis- covery38(1), 22–48 (2024)

work page 2024

[22] [22]

In: Interna- tional Conference on Learning Representa- tions (2022) 17

Xu, J., Wu, H., Wang, J., Long, M.: Anomaly transformer: Time series anomaly detection with association discrepancy. In: Interna- tional Conference on Learning Representa- tions (2022) 17

work page 2022

[23] [23]

Proceedings of the VLDB Endowment15(6), 1201–1214 (2022)

Tuli, S., Casale, G., Jennings, N.R.: Tranad: Deep transformer networks for anomaly detection in multivariate time series data. Proceedings of the VLDB Endowment15(6), 1201–1214 (2022)

work page 2022

[24] [24]

Alpagasus: Training a better alpaca with fewer data.arXiv preprint arXiv:2307.08701,

Li, S., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.-X., Yan, X.: Time series forecasting with transformers: A survey. arXiv preprint arXiv:2307.08701 (2023)

work page arXiv 2023

[25] [25]

Pattern Recognition138, 109394 (2023)

Liu, J., Chen, M., Wang, Y.: Dynamic posi- tional encoding for transformer-based time series analysis. Pattern Recognition138, 109394 (2023)

work page 2023

[26] [26]

Transformers in time series: A survey,

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 (2022)

work page arXiv 2022

[27] [27]

arXiv preprint arXiv:2312.17044 (2024)

Zhao, L., Qi, Y., Zhang, S., Ma, Y., Liu, S., Zhou, T., et al.: Length extrapolation of transformers: A survey from the perspec- tive of positional encoding. arXiv preprint arXiv:2312.17044 (2024)

work page arXiv 2024

[28] [28]

arXiv preprint arXiv:2305.19466 (2023)

Kazemnejad, A., Padhi, I., Rish, I., Reddy, S., Cheung, J.C.K.: The impact of positional encoding on length generalization in trans- formers. arXiv preprint arXiv:2305.19466 (2023)

work page arXiv 2023

[29] [29]

arXiv preprint arXiv:2404.10337 (2024)

Zhang, J., Wang, J., Qiang, W., Xu, F., Zheng, C., Sun, F., Xiong, H.: Intrigu- ing properties of positional encoding in time series forecasting. arXiv preprint arXiv:2404.10337 (2024)

work page arXiv 2024

[30] [30]

ACM Computing Surveys55(6), 1–28 (2022)

Tay, Y., Dehghani, M., Bahri, D., Metzler, D.: Efficient transformers: A survey. ACM Computing Surveys55(6), 1–28 (2022)

work page 2022

[31] [31]

arXiv preprint arXiv:2006.15595 (2020)

Ke, G., He, D., Liu, T.-Y.: Rethinking posi- tional encoding in language pre-training. arXiv preprint arXiv:2006.15595 (2020)

work page arXiv 2006

[32] [32]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y.: Neu- ral machine translation by jointly learn- ing to align and translate. arXiv preprint arXiv:1409.0473 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[33] [33]

Effective Approaches to Attention-based Neural Machine Translation

Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neu- ral machine translation. arXiv preprint arXiv:1508.04025 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[34] [34]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.: A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [35]

Knowledge- Based Systems191, 105239 (2020)

Fan, C., Zhang, Y., Pan, Y., Li, X.: A dual attention-based coupling network for multi- variate time series forecasting. Knowledge- Based Systems191, 105239 (2020)

work page 2020

[36] [36]

In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

Song, H., Rajan, D., Thiagarajan, J.J., Spanias, A.: Attend and diagnose: Clinical time series analysis using attention mod- els. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

work page 2018

[37] [37]

Advances in neural information processing systems28(2015)

Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based mod- els for speech recognition. Advances in neural information processing systems28(2015)

work page 2015

[38] [38]

In: Pro- ceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (2020)

Hao, Y., Cao, H.: A new attention mechanism to classify multivariate time series. In: Pro- ceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (2020)

work page 2020

[39] [39]

arXiv preprint arXiv:2103.14438 (2021)

Liu, M., Ren, S., Ma, S., Jiao, J., Chen, Y., Wang, Z., Song, W.: Gated transformer networks for multivariate time series clas- sification. arXiv preprint arXiv:2103.14438 (2021)

work page arXiv 2021

[40] [40]

Longformer: The Long-Document Transformer

Beltagy, I., Peters, M.E., Cohan, A.: Long- former: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2004

[41] [41]

Reformer: The Efficient Transformer

Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001

[42] [42]

IEEE Transactions on Neural Net- works and Learning Systems32(8), 3425– 3437 (2021) 18

Chu, X., Yang, W., Zhang, L.: Contextual position encoding for time series classifi- cation. IEEE Transactions on Neural Net- works and Learning Systems32(8), 3425– 3437 (2021) 18

work page 2021

[43] [43]

Applied Intelligence53(12), 15234–15251 (2023)

Anderson, L., Thompson, M., White, J.: Embedding strategies for temporal data in deep learning. Applied Intelligence53(12), 15234–15251 (2023)

work page 2023

[44] [44]

Machine Learning 111(8), 2943–2967 (2022)

Taylor, M., Davis, S., Wilson, J.: Transformer architectures for time series: Positional encoding considerations. Machine Learning 111(8), 2943–2967 (2022)

work page 2022

[45] [45]

¨O., Loeff, N., Pfister, T.: Temporal fusion transformers for inter- pretable multi-horizon time series forecast- ing

Lim, B., Arık, S. ¨O., Loeff, N., Pfister, T.: Temporal fusion transformers for inter- pretable multi-horizon time series forecast- ing. International Journal of Forecasting 37(4), 1748–1764 (2021)

work page 2021

[46] [46]

Information Sciences625, 789–805 (2023)

Zhang, L., Wang, Q., Chen, R.: Hierarchical positional encoding for multi-scale time series analysis. Information Sciences625, 789–805 (2023)

work page 2023

[47] [47]

Machine Learning112(4), 1345–1372 (2023)

Schmidt, H., Mueller, A., Weber, K.: Multi- scale positional encoding for hierarchical time series analysis. Machine Learning112(4), 1345–1372 (2023)

work page 2023

[48] [48]

In: Advances in Neural Information Processing Systems, vol

Kumar, R., Sharma, P., Gupta, A.: Posi- tional encoding methods for long-term time series prediction. In: Advances in Neural Information Processing Systems, vol. 36, pp. 28456–28471 (2023)

work page 2023

[49] [49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cordonnier, J.-B., Mahendran, A., Doso- vitskiy, A., Weissenborn, D., Uszkoreit, J., Unterthiner, T.: Differentiable patch selec- tion for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2351– 2360 (2021)

work page 2021

[50] [50]

arXiv preprint arXiv:2009.13658 (2020)

Huang, Z., Liang, D., Xu, P., Xiang, B.: Improve transformer models with better rel- ative position embeddings. arXiv preprint arXiv:2009.13658 (2020)

work page arXiv 2009

[51] [51]

International Conference on Learning Repre- sentations (ICLR) (2021)

Ke, G., He, D., Liu, T.-Y.: Rethinking posi- tional encoding in language pre-training. International Conference on Learning Repre- sentations (ICLR) (2021)

work page 2021

[52] [52]

Self-Attention with Relative Position Representations

Shaw, P., Uszkoreit, J., Vaswani, A.: Self- attention with relative position represen- tations. arXiv preprint arXiv:1803.02155 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[53] [53]

In: International Conference on Machine Learning, pp

Liutkus, A., C ¸ ıfka, O., Wu, S.-L., Simsekli, U., Yang, Y.-H., Richard, G.: Relative posi- tional encoding for transformers with lin- ear complexity. In: International Conference on Machine Learning, pp. 7067–7079 (2021). PMLR

work page 2021

[54] [54]

The Journal of Supercomput- ing81(1), 282 (2025)

Alioghli, A.A., Yıldırım Okay, F.: Enhanc- ing multivariate time-series anomaly detec- tion with positional encoding mechanisms in transformers. The Journal of Supercomput- ing81(1), 282 (2025)

work page 2025

[55] [55]

International Journal of Forecasting39(3), 1234–1248 (2023)

Martinez, C., Garcia, A., Lopez, J.: Posi- tional encoding strategies for multivariate time series forecasting. International Journal of Forecasting39(3), 1234–1248 (2023)

work page 2023

[56] [56]

Neural Computing and Applications 35(12), 8765–8778 (2023)

Wang, H., Li, J., Zhang, X.: Frequency-based positional encoding for time series trans- formers. Neural Computing and Applications 35(12), 8765–8778 (2023)

work page 2023

[57] [57]

Digital Signal Processing134, 103921 (2023)

Nguyen, L., Tran, D., Pham, M.: Spec- tral analysis of positional encodings in time series models. Digital Signal Processing134, 103921 (2023)

work page 2023

[58] [58]

Expert Systems with Applications213, 118912 (2023)

Li, X., Yang, J., Zhou, F.: Learnable posi- tional encoding for time series transform- ers. Expert Systems with Applications213, 118912 (2023)

work page 2023

[59] [59]

In: International Conference on Machine Learning, pp

Foster, J., Adams, R., Collins, S.: Learn- ing optimal positional encodings for time series tasks. In: International Conference on Machine Learning, pp. 10123–10138 (2023)

work page 2023

[60] [60]

Data Mining and Knowledge Discovery37(3), 1089–1115 (2023)

Miller, S., Green, L., Hall, R.: Hybrid posi- tional encoding approaches for time series transformers. Data Mining and Knowledge Discovery37(3), 1089–1115 (2023)

work page 2023

[61] [61]

Expert Systems with Applications 220, 119678 (2023) 19

Bell, A., Gray, M., King, R.: Adaptive posi- tional encoding mechanisms for dynamic time series. Expert Systems with Applications 220, 119678 (2023) 19

work page 2023

[62] [62]

Knowledge-Based Systems272, 110567 (2023)

Campbell, D., Reed, E., Bailey, M.: Trans- former models for irregular time series: Posi- tion encoding challenges. Knowledge-Based Systems272, 110567 (2023)

work page 2023

[63] [63]

Pat- tern Recognition Letters156, 87–94 (2022)

Lee, C., Kim, M., Park, S.: Adaptive position encoding for variable-length time series. Pat- tern Recognition Letters156, 87–94 (2022)

work page 2022

[64] [64]

In: AAAI Conference on Artificial Intelligence, vol

Patel, R., Kumar, A., Singh, N.: Efficient positional encoding for long time series. In: AAAI Conference on Artificial Intelligence, vol. 37, pp. 9456–9464 (2023)

work page 2023

[65] [65]

Knowledge-Based Systems268, 110456 (2023)

Chen, Y., Zhou, H., Liu, S.: Adaptive posi- tional encoding for long sequence time series forecasting. Knowledge-Based Systems268, 110456 (2023)

work page 2023

[66] [66]

Neurocom- puting568, 127063 (2024)

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced transformer with rotary position embedding. Neurocom- puting568, 127063 (2024)

work page 2024

[67] [67]

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Press, O., Smith, N.A., Lewis, M.: Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[68] [68]

10994–11004 (2020)

Yun, C., Bhojanapalli, S., Rawat, A.S., Reddi, S.J., Kumar, S.: Are transformers uni- versal approximators of sequence-to-sequence functions? In: International Conference on Machine Learning, pp. 10994–11004 (2020). PMLR

work page 2020

[69] [69]

Tsai, Y.-H.H., Bai, S., Yamada, M., Morency, L.-P., Salakhutdinov, R.: Transformer dis- section: An unified understanding for trans- former’s attention via the lens of kernel. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), ...

work page 2019

[70] [70]

In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Ana- lyzing and Interpreting Neural Networks for NLP, pp

Clark, K., Khandelwal, U., Levy, O., Man- ning, C.D.: What does bert look at? an analysis of bert’s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Ana- lyzing and Interpreting Neural Networks for NLP, pp. 276–286 (2019)

work page 2019

[71] [71]

IEEE Transactions on Signal Processing 71, 1892–1904 (2023)

Sun, W., Chen, L., Zhao, M.: Convolutional positional encoding for time series transform- ers. IEEE Transactions on Signal Processing 71, 1892–1904 (2023)

work page 1904

[72] [72]

Signal Processing205, 108871 (2023)

Kim, H., Park, J., Lee, S.: Temporal posi- tional encoding with attention mechanisms. Signal Processing205, 108871 (2023)

work page 2023

[73] [73]

Artificial Intelligence Review57(8), 6789–6821 (2022)

Peters, F., Hoffman, C., Young, T.: Deep learning approaches to time series with positional awareness. Artificial Intelligence Review57(8), 6789–6821 (2022)

work page 2022

[74] [74]

International Journal of Machine Learn- ing and Cybernetics14(6), 2134–2149 (2023)

Thompson, S., Harris, B., Turner, M.: Cyber- netic principles in modern time series analy- sis. International Journal of Machine Learn- ing and Cybernetics14(6), 2134–2149 (2023)

work page 2023

[75] [75]

Journal of Machine Learning Research21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research21(140), 1–67 (2020)

work page 2020

[76] [76]

Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.-C.M., Zhu, Y., Gharghabi, S., Ratanama- hatana, C.A., Yanping, C., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR Time Series Classification Archive (2018)

work page 2018

[77] [77]

Applied Sciences7(10), 1101 (2017)

Micucci, D., Mobilio, M., Napoletano, P.: Unimib shar: A dataset for human activ- ity recognition using acceleration data from smartphones. Applied Sciences7(10), 1101 (2017)

work page 2017

[78] [78]

UCI Machine Learning Repository

N, K., I, K., Makarov, K.V., S, L.: EMG Data for Gestures. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5ZP5C (2018)

work page doi:10.24432/c5zp5c 2018

[79] [79]

UCI Machine Learning Reposi- tory

Singh, A.P., Chaudhari, S.: Room Occupancy Estimation. UCI Machine Learning Reposi- tory. DOI: https://doi.org/10.24432/C5P605 (2018)

work page doi:10.24432/c5p605 2018

[80] [80]

Data Mining and Knowledge Discovery31(3), 606– 660 (2017)

Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification 20 bake off: A review and experimental evalu- ation of recent algorithmic advances. Data Mining and Knowledge Discovery31(3), 606– 660 (2017)

work page 2017