Extreme Adaptive Transformer for Time Series Forecasting

Hui Liu; Sanjeev Shrestha; Yifan Zhang

arxiv: 2607.02437 · v1 · pith:YQTBDV67new · submitted 2026-07-02 · 💻 cs.LG

Extreme Adaptive Transformer for Time Series Forecasting

Sanjeev Shrestha , Hui Liu , Yifan Zhang This is my paper

Pith reviewed 2026-07-03 16:29 UTC · model grok-4.3

classification 💻 cs.LG

keywords time series forecastingtransformerextreme eventshydrologic forecastingattention mechanismstreamflow predictionimbalanced data

0 comments

The pith

Exformer adds an extreme-specific attention component to better forecast rare peaks in imbalanced time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Exformer, a Transformer model for time series forecasting that adds an extreme-adaptive attention mechanism to handle rare but high-impact events. This mechanism splits attention into Local, Stride, and Extreme components, with the last one linking normal and extreme patterns in the data. The approach targets problems like hydrologic streamflow prediction where skewed distributions make standard models miss critical peaks that affect flood warnings and resource planning. Experiments on four real-world datasets show the model produces more accurate 3-day forecasts than existing baselines.

Core claim

Exformer introduces an extreme-adaptive attention mechanism composed of Local, Stride, and Extreme sparse components. The Extreme component selectively models event-aware dependencies between normal and extreme streamflow patterns, enabling superior performance in forecasting imbalanced time series with rare consequential events.

What carries the argument

extreme-adaptive attention mechanism with Local, Stride, and Extreme sparse components that capture short-term, periodic, and event-aware dependencies respectively

Load-bearing premise

The extreme component selectively models event-aware dependencies between normal and extreme patterns in a way that holds beyond the four tested datasets and the specific definition of extremes used.

What would settle it

If a new hydrologic streamflow dataset with a different extreme threshold shows Exformer no longer outperforming baselines on 3-day forecasts, the central performance claim would be refuted.

Figures

Figures reproduced from arXiv: 2607.02437 by Hui Liu, Sanjeev Shrestha, Yifan Zhang.

**Figure 2.** Figure 2: Illustration of attention masks used in Dozer, Extreme, and the proposed Extreme-adaptive [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of forecasting results on the Saragota dataset at a 3-day forecast horizon. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Sensitivity analysis of 3-day prediction across different threshold values [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Time series forecasting remains challenging when the underlying data contain rare but critical extreme events. This issue is particularly important in hydrologic forecasting, where streamflow distributions are often highly skewed and extreme peaks can have substantial impacts on flood monitoring, water resource management, and early warning systems. Although Transformer-based forecasting models have achieved strong performance by modeling long-range temporal dependencies, they typically treat all time points uniformly and may therefore underrepresent rare extreme patterns. In this paper, we propose the Extreme-Adaptive Transformer (Exformer), a forecasting framework designed to explicitly model temporal dependencies involving both normal and extreme events. Exformer introduces an extreme-adaptive attention mechanism composed of three sparse components: Local, Stride, and Extreme. The Local and Stride components capture short-term and periodic temporal dependencies, respectively, while the Extreme component selectively models event-aware dependencies between normal and extreme streamflow patterns. Experiments on four real-world hydrologic streamflow datasets show that Exformer achieves superior 3-day forecasting performance compared with state-of-the-art baselines. Our findings demonstrate that explicitly incorporating extreme-aware attention improves the forecasting capacity of Transformer models on imbalanced time series with rare but consequential events.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Exformer adds an extreme attention component to sparse Transformers for skewed streamflow data and reports better 3-day forecasts on four hydrologic datasets, but the methods lack detail on labeling and combination.

read the letter

Exformer modifies Transformer attention with three sparse parts—Local, Stride, and Extreme—to better handle rare high-impact events in imbalanced time series like streamflow. The core addition is the Extreme component, which tries to link normal and peak patterns explicitly.

The work targets a practical issue in hydrology where standard models underperform on tails of the distribution. Running experiments on four real streamflow datasets and claiming gains at the 3-day horizon is a concrete step. The framing around flood monitoring and water management gives the motivation some grounding.

The abstract supplies no numbers, error bars, or statistical tests, and it does not explain how extreme events were identified or how the three attention pieces are merged in practice. Without those details the performance edge is hard to assess. Generalization beyond the chosen datasets and the specific extreme threshold also remains open; the paper does not test whether the same gains appear on other skewed series or with different definitions.

This paper suits readers who build forecasting tools for domains with rare but costly events, such as hydrology or energy load spikes. Someone already working on sparse attention variants might pick up the extreme head idea as a small extension.

The central empirical claim is narrow enough to be checked, so the paper deserves a serious referee. The methods section will need expansion on labeling and architecture before it can be evaluated properly.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes the Extreme-Adaptive Transformer (Exformer) for time series forecasting on imbalanced data containing rare extreme events, with a focus on hydrologic streamflow. It introduces an extreme-adaptive attention mechanism consisting of three sparse components (Local for short-term dependencies, Stride for periodic dependencies, and Extreme for event-aware dependencies between normal and extreme patterns). The central empirical claim is that Exformer achieves superior 3-day forecasting performance on four real-world hydrologic datasets relative to state-of-the-art baselines.

Significance. If the performance gains are shown to be statistically significant with appropriate controls and error bars, the work offers a targeted extension of Transformer attention for skewed time series, which could improve applications in flood monitoring and water resource management. The bounded claim on four specific datasets is directly testable and does not rely on parameter-free derivations or machine-checked proofs.

minor comments (3)

[Abstract] Abstract: the claim of 'superior 3-day forecasting performance' is stated without any quantitative metrics (e.g., MAE, RMSE), error bars, statistical tests, or details on how extreme events were labeled or how the three attention components are combined; this weakens the ability to evaluate the central empirical result.
[Methods] The manuscript would benefit from a dedicated methods subsection clarifying the integration of the Extreme component with the Local and Stride components and the precise definition of 'event-aware dependencies.'
[Experiments] Experiments section: include a table reporting per-dataset metrics with standard deviations across multiple runs and baseline comparisons to support the superiority claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation of minor revision. We appreciate the recognition that the bounded empirical claim on four hydrologic datasets is directly testable. We will incorporate statistical significance testing and error bars as noted in the significance assessment.

Circularity Check

0 steps flagged

No significant circularity; empirical claim only

full rationale

The paper advances an empirical claim: on four specific hydrologic streamflow datasets, Exformer outperforms listed baselines at the 3-day horizon. No derivation chain, equations, or first-principles predictions appear in the abstract or description. The model description (extreme-adaptive attention with Local/Stride/Extreme components) is presented as an architectural choice, not as a result derived from prior fitted quantities or self-citations that reduce to the target performance numbers. The reader's weakest_assumption concerns external generalization, which is an external-validity issue rather than an internal reduction of the reported results to their own inputs. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unstated assumption that the four hydrologic datasets contain representative extreme events and that the Extreme attention component can be trained without additional supervision on event labels. No free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Standard multi-head attention can be sparsified into independent Local, Stride, and Extreme heads without destroying gradient flow or expressivity.
Implicit in the design of the extreme-adaptive attention mechanism.

invented entities (1)

Extreme attention component no independent evidence
purpose: Selectively model dependencies between normal and extreme streamflow patterns
New postulated attention head introduced to handle rare events; no independent evidence (e.g., predicted mass or external validation) is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5727 in / 1266 out tokens · 23215 ms · 2026-07-03T16:29:42.200685+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 10 canonical work pages

[1]

, title =

Li, Yanhong and Xu, Jack and Anastasiu, David C. , title =. 2024 , isbn =. doi:10.1609/aaai.v38i1.27768 , articleno =

work page doi:10.1609/aaai.v38i1.27768 2024
[2]

Sparse transformer with local and seasonal adaptation for multivariate time series forecasting

Zhang, Yifan and Wu, Rui and Dascalu, Sergiu M and Harris, Jr, Frederick C. Sparse transformer with local and seasonal adaptation for multivariate time series forecasting. Sci. Rep
[3]

2022 , volume =

Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong , booktitle =. 2022 , volume =

2022
[4]

2021 , isbn =

Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng , title =. 2021 , isbn =

2021
[5]

2023 , url=

Huiqiang Wang and Jian Peng and Feihu Huang and Jince Wang and Junhui Chen and Yifei Xiao , booktitle=. 2023 , url=

2023
[6]

and Harris, Frederick C

Zhang, Yifan and Wu, Rui and Dascalu, Sergiu M. and Harris, Frederick C. , journal=. Multi-Scale Transformer Pyramid Networks for Multivariate Time Series Forecasting , year=
[7]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=. doi:10.1609/aaai.v35i12.17325 , number=

work page doi:10.1609/aaai.v35i12.17325 2021
[8]

2023 , eprint=

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. 2023 , eprint=

2023
[9]

The Eleventh International Conference on Learning Representations , year=

Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting , author=. The Eleventh International Conference on Learning Representations , year=
[10]

2024 , eprint=

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting , author=. 2024 , eprint=

2024
[11]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2023 , month=. doi:10.1609/aaai.v37i7.26045 , number=

work page doi:10.1609/aaai.v37i7.26045 2023
[12]

2020 , eprint=

Longformer: The Long-Document Transformer , author=. 2020 , eprint=

2020
[13]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Zaheer, Manzil and Guruganesh, Guru and Dubey, Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

2020
[14]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence,

Transformers in Time Series: A Survey , author =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence,. 2023 , month =. doi:10.24963/ijcai.2023/759 , url =

work page doi:10.24963/ijcai.2023/759 2023
[15]

2021 , eprint=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. 2021 , eprint=

2021
[16]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
[17]

PFformer: A Position-Free Transformer Variant for Extreme-Adaptive Multivariate Time Series Forecasting

Li, Yanhong and Anastasiu, David C. PFformer: A Position-Free Transformer Variant for Extreme-Adaptive Multivariate Time Series Forecasting. Data Science: Foundations and Applications. 2025

2025
[18]

G. E. P. Box and David A. Pierce , title =. Journal of the American Statistical Association , volume =. 1970 , publisher =. doi:10.1080/01621459.1970.10481180 , URL =

work page doi:10.1080/01621459.1970.10481180 1970
[19]

2018 , isbn =

Lai, Guokun and Chang, Wei-Cheng and Yang, Yiming and Liu, Hanxiao , title =. 2018 , isbn =. doi:10.1145/3209978.3210006 , booktitle =

work page doi:10.1145/3209978.3210006 2018
[20]

2020 , eprint=

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting , author=. 2020 , eprint=

2020
[21]

2020 , isbn =

Wu, Zonghan and Pan, Shirui and Long, Guodong and Jiang, Jing and Chang, Xiaojun and Zhang, Chengqi , title =. 2020 , isbn =. doi:10.1145/3394486.3403118 , booktitle =

work page doi:10.1145/3394486.3403118 2020
[22]

2022 , eprint=

Are Transformers Effective for Time Series Forecasting? , author=. 2022 , eprint=

2022
[23]

Traffic Flow Prediction With Big Data: A Deep Learning Approach , year=

Lv, Yisheng and Duan, Yanjie and Kang, Wenwen and Li, Zhengxi and Wang, Fei-Yue , journal=. Traffic Flow Prediction With Big Data: A Deep Learning Approach , year=
[24]

Pattern Anal

Hewage, Pradeep and Trovati, Marcello and Pereira, Ella and Behera, Ardhendu , title =. Pattern Anal. Appl. , month = feb, pages =. 2021 , issue_date =. doi:10.1007/s10044-020-00898-1 , abstract =

work page doi:10.1007/s10044-020-00898-1 2021
[25]

, booktitle=

Mohan, Saloni and Mullapudi, Sahitya and Sammeta, Sudheer and Vijayvergia, Parag and Anastasiu, David C. , booktitle=. Stock Price Prediction Using News Sentiment Analysis , year=
[26]

and Harris, Frederick C

Zhang, Yifan and Wu, Rui and Dascalu, Sergiu M. and Harris, Frederick C. , title=. Scientific Reports , year=. doi:10.1038/s41598-024-53460-y , url=

work page doi:10.1038/s41598-024-53460-y
[27]

Earth Science Informatics , year=

Yan, Le and Chen, Changwei and Hang, Tingting and Hu, Youchuan , title=. Earth Science Informatics , year=. doi:10.1007/s12145-021-00571-z , url=

work page doi:10.1007/s12145-021-00571-z

[1] [1]

, title =

Li, Yanhong and Xu, Jack and Anastasiu, David C. , title =. 2024 , isbn =. doi:10.1609/aaai.v38i1.27768 , articleno =

work page doi:10.1609/aaai.v38i1.27768 2024

[2] [2]

Sparse transformer with local and seasonal adaptation for multivariate time series forecasting

Zhang, Yifan and Wu, Rui and Dascalu, Sergiu M and Harris, Jr, Frederick C. Sparse transformer with local and seasonal adaptation for multivariate time series forecasting. Sci. Rep

[3] [3]

2022 , volume =

Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong , booktitle =. 2022 , volume =

2022

[4] [4]

2021 , isbn =

Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng , title =. 2021 , isbn =

2021

[5] [5]

2023 , url=

Huiqiang Wang and Jian Peng and Feihu Huang and Jince Wang and Junhui Chen and Yifei Xiao , booktitle=. 2023 , url=

2023

[6] [6]

and Harris, Frederick C

Zhang, Yifan and Wu, Rui and Dascalu, Sergiu M. and Harris, Frederick C. , journal=. Multi-Scale Transformer Pyramid Networks for Multivariate Time Series Forecasting , year=

[7] [7]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=. doi:10.1609/aaai.v35i12.17325 , number=

work page doi:10.1609/aaai.v35i12.17325 2021

[8] [8]

2023 , eprint=

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. 2023 , eprint=

2023

[9] [9]

The Eleventh International Conference on Learning Representations , year=

Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting , author=. The Eleventh International Conference on Learning Representations , year=

[10] [10]

2024 , eprint=

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting , author=. 2024 , eprint=

2024

[11] [11]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2023 , month=. doi:10.1609/aaai.v37i7.26045 , number=

work page doi:10.1609/aaai.v37i7.26045 2023

[12] [12]

2020 , eprint=

Longformer: The Long-Document Transformer , author=. 2020 , eprint=

2020

[13] [13]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Zaheer, Manzil and Guruganesh, Guru and Dubey, Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

2020

[14] [14]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence,

Transformers in Time Series: A Survey , author =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence,. 2023 , month =. doi:10.24963/ijcai.2023/759 , url =

work page doi:10.24963/ijcai.2023/759 2023

[15] [15]

2021 , eprint=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. 2021 , eprint=

2021

[16] [16]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

[17] [17]

PFformer: A Position-Free Transformer Variant for Extreme-Adaptive Multivariate Time Series Forecasting

Li, Yanhong and Anastasiu, David C. PFformer: A Position-Free Transformer Variant for Extreme-Adaptive Multivariate Time Series Forecasting. Data Science: Foundations and Applications. 2025

2025

[18] [18]

G. E. P. Box and David A. Pierce , title =. Journal of the American Statistical Association , volume =. 1970 , publisher =. doi:10.1080/01621459.1970.10481180 , URL =

work page doi:10.1080/01621459.1970.10481180 1970

[19] [19]

2018 , isbn =

Lai, Guokun and Chang, Wei-Cheng and Yang, Yiming and Liu, Hanxiao , title =. 2018 , isbn =. doi:10.1145/3209978.3210006 , booktitle =

work page doi:10.1145/3209978.3210006 2018

[20] [20]

2020 , eprint=

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting , author=. 2020 , eprint=

2020

[21] [21]

2020 , isbn =

Wu, Zonghan and Pan, Shirui and Long, Guodong and Jiang, Jing and Chang, Xiaojun and Zhang, Chengqi , title =. 2020 , isbn =. doi:10.1145/3394486.3403118 , booktitle =

work page doi:10.1145/3394486.3403118 2020

[22] [22]

2022 , eprint=

Are Transformers Effective for Time Series Forecasting? , author=. 2022 , eprint=

2022

[23] [23]

Traffic Flow Prediction With Big Data: A Deep Learning Approach , year=

Lv, Yisheng and Duan, Yanjie and Kang, Wenwen and Li, Zhengxi and Wang, Fei-Yue , journal=. Traffic Flow Prediction With Big Data: A Deep Learning Approach , year=

[24] [24]

Pattern Anal

Hewage, Pradeep and Trovati, Marcello and Pereira, Ella and Behera, Ardhendu , title =. Pattern Anal. Appl. , month = feb, pages =. 2021 , issue_date =. doi:10.1007/s10044-020-00898-1 , abstract =

work page doi:10.1007/s10044-020-00898-1 2021

[25] [25]

, booktitle=

Mohan, Saloni and Mullapudi, Sahitya and Sammeta, Sudheer and Vijayvergia, Parag and Anastasiu, David C. , booktitle=. Stock Price Prediction Using News Sentiment Analysis , year=

[26] [26]

and Harris, Frederick C

Zhang, Yifan and Wu, Rui and Dascalu, Sergiu M. and Harris, Frederick C. , title=. Scientific Reports , year=. doi:10.1038/s41598-024-53460-y , url=

work page doi:10.1038/s41598-024-53460-y

[27] [27]

Earth Science Informatics , year=

Yan, Le and Chen, Changwei and Hang, Tingting and Hu, Youchuan , title=. Earth Science Informatics , year=. doi:10.1007/s12145-021-00571-z , url=

work page doi:10.1007/s12145-021-00571-z