ADMFormer: An Adaptive-Decomposition Transformer with Time-Varying Masked Spatial Attention for Traffic Forecasting

Qitai Tan; Ruiwen Gu; Xiao-Ping Zhang; Yahao Liu

arxiv: 2605.25543 · v1 · pith:DHOFI7D7new · submitted 2026-05-25 · 💻 cs.AI

ADMFormer: An Adaptive-Decomposition Transformer with Time-Varying Masked Spatial Attention for Traffic Forecasting

Ruiwen Gu , Qitai Tan , Yahao Liu , Xiao-Ping Zhang This is my paper

Pith reviewed 2026-06-29 21:56 UTC · model grok-4.3

classification 💻 cs.AI

keywords traffic forecastingtransformeradaptive decompositionspatial attentiontime seriesintelligent transportation systemstemporal patterns

0 comments

The pith

ADMFormer decouples traffic time series into regular and fluctuating components using adaptive gating to improve forecasting accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that traffic data's mixed stable patterns and sudden changes can be handled by first using a gating mechanism to split them apart at each time and location. It then processes the parts in separate branches for periodic and irregular behaviors while using masked attention to focus only on relevant spatial connections that change over time. If this works, forecasts would better reflect real traffic dynamics without being overwhelmed by noise or uniform treatment of all interactions. This matters because accurate traffic predictions support better traffic management and planning in cities.

Core claim

The central discovery is that an adaptive decomposition transformer can separate traffic signals into dominant regularities and residual fluctuations via time-node adaptive gating. A dual-branch temporal module then models global periodic dependencies in one branch and high-frequency variations in the other. Time-varying masked spatial attention sparsifies the spatial graph based on current states to preserve informative dependencies, leading to state-of-the-art results on four real-world datasets.

What carries the argument

The time-node adaptive gating mechanism for signal decomposition combined with time-varying masked spatial attention for dynamic dependency modeling.

If this is right

Traffic series are modeled with separate handling for stable periodic regularities and event-driven fluctuations.
Spatial dependencies are made dynamic and sparse to avoid redundant interactions and noise.
State-of-the-art performance is achieved on four real-world traffic forecasting datasets.
Dual-branch processing captures both global periodic and high-frequency irregular variations effectively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decomposition approach might extend to other domains with mixed regular and irregular time series such as stock prices or weather data.
Masked attention based on real-time states could apply to other networked forecasting problems like power grid load prediction.
Separate analysis of the decomposed components could provide new insights into what drives traffic fluctuations versus regular flows.

Load-bearing premise

The time-node adaptive gating mechanism can effectively decouple traffic signals into dominant regularities and residual fluctuations that vary across time and nodes.

What would settle it

Demonstrating that ADMFormer does not outperform baseline methods on the four real-world datasets when the adaptive gating or the masked attention components are ablated.

Figures

Figures reproduced from arXiv: 2605.25543 by Qitai Tan, Ruiwen Gu, Xiao-Ping Zhang, Yahao Liu.

**Figure 2.** Figure 2: Ablation study on PEMS07 and PeMS08. To further investigate the contribution of each component in our model, we conduct an ablation study on the following variants of ADMFormer: 1) w/o DA: it removes the adaptive decomposition module and uses Fourier Attention for temporal modeling. 2) w/o DM: it removes the adaptive decomposition module and uses FreMLP for temporal modeling. 3) w/o NE: it removes node e… view at source ↗

**Figure 3.** Figure 3: Parameter sensitivity analysis on PEMS04 and PeMS08. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Analysis of the Frequency-aware Mask: Heatmap of the learned mask (left) and traffic flow visualization between node pairs (right). [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Accurate traffic forecasting is essential for intelligent transportation systems, supporting a wide range of real-world applications. However, it remains challenging due to two key factors:~(1) Traffic series contain heterogeneous temporal patterns, where stable periodic regularities coexist with event-driven fluctuations. Existing methods often treat them within a unified representation, limiting their ability to capture fine-grained temporal dynamics.~(2)Spatial dependencies among nodes are inherently dynamic and sparse, while dense all-pairs attention often introduces redundant interactions and amplifies noise. To address these issues, we propose ADMFormer, an Adaptive-Decomposition Transformer with Time-Varying Masked Spatial Attention. Specifically, ADMFormer first employs a time-node adaptive gating mechanism to decouple traffic signals into dominant regularities and residual fluctuations that vary across time and nodes. A dual-branch temporal module is then designed to separately capture global periodic dependencies and high-frequency irregular variations from these two decomposed components. Furthermore, ADMFormer introduces a time-varying masked spatial attention that sparsifies spatial interactions based on real-time traffic states, thereby effectively preserving dynamic and informative dependencies. Extensive experiments on four real-world datasets demonstrate that ADMFormer achieves state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADMFormer adds time-node adaptive gating for signal decomposition plus time-varying masked spatial attention to a transformer and reports SOTA on four traffic datasets.

read the letter

The main takeaway is that this paper proposes ADMFormer, which uses a time-node adaptive gating step to split traffic series into dominant regularities and residual fluctuations, feeds those into a dual-branch temporal module, and applies time-varying masked spatial attention to handle dynamic sparse dependencies. It claims this combination reaches state-of-the-art on four real-world datasets.

What is actually new is the concrete pairing of the adaptive gating (which varies across both time and nodes) with the masked attention that sparsifies based on real-time states. Separate pieces like decomposition and masked attention appear in prior time-series work, but the full architecture here is presented as a fresh proposal.

The paper does a clear job naming the two practical problems in traffic data—heterogeneous temporal patterns and noisy dense spatial interactions—and then matching a component to each. The dual-branch temporal design and the state-dependent masking are direct responses rather than generic additions.

The soft spots are in the evidence. The abstract gives no equations, no ablation numbers, no error bars, and no dataset details, so the SOTA claim cannot be checked from the given text. If the full paper supplies those results and shows the gains are stable across baselines, the contribution strengthens; without them the work stays at the level of a motivated architecture sketch. No internal contradictions appear in the argument structure itself.

This paper is for researchers working on spatial-temporal forecasting in transportation or similar applied domains. A reader already following transformer variants for irregular series would find the design choices worth examining.

It deserves peer review because the proposal is coherent, the motivation is explicit, and the evaluation plan uses multiple real datasets. I would send it out.

Referee Report

1 major / 1 minor

Summary. The paper proposes ADMFormer, an Adaptive-Decomposition Transformer for traffic forecasting. It introduces a time-node adaptive gating mechanism to decouple traffic signals into dominant regularities and residual fluctuations varying across time and nodes, a dual-branch temporal module to separately model global periodic dependencies and high-frequency irregular variations, and time-varying masked spatial attention to sparsify dynamic spatial interactions based on real-time states. The central claim is that this architecture achieves state-of-the-art performance on four real-world datasets.

Significance. If the reported gains are reproducible and the ablations confirm the contribution of each component, the adaptive decomposition and state-dependent masking could meaningfully improve modeling of heterogeneous temporal patterns and sparse dynamic spatial dependencies in traffic data, offering a practical advance for intelligent transportation systems applications.

major comments (1)

[Abstract] Abstract: the central SOTA claim cannot be evaluated because the provided text contains no quantitative results, dataset names, baseline comparisons, error metrics, ablation studies, or statistical significance tests; without these the performance assertion remains unverified.

minor comments (1)

[Abstract] Abstract, paragraph 2: the phrase 'time-node adaptive gating mechanism' is introduced without even a high-level equation or pseudocode sketch, which reduces immediate clarity for readers familiar with gating mechanisms in time-series models.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the observation on the abstract. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central SOTA claim cannot be evaluated because the provided text contains no quantitative results, dataset names, baseline comparisons, error metrics, ablation studies, or statistical significance tests; without these the performance assertion remains unverified.

Authors: We agree that the abstract, in its current form, presents the SOTA claim at a high level without supporting numbers, dataset names, or metrics, which prevents direct evaluation from the abstract alone. The full manuscript contains the required details in the Experiments section, including the four dataset names, baseline comparisons, MAE/RMSE/MAPE results, ablation studies, and statistical comparisons. We will revise the abstract to incorporate concise quantitative highlights (e.g., specific relative improvements and dataset references) while preserving its brevity. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents ADMFormer as an architectural proposal combining adaptive gating, dual-branch temporal modeling, and time-varying masked spatial attention to address traffic forecasting challenges. All load-bearing claims reduce to empirical SOTA results on four external datasets rather than any internal derivation, self-referential fitting, or self-citation chain. No equations are shown that equate a 'prediction' to a fitted parameter by construction, and the method is described as a direct response to the two stated challenges without importing uniqueness theorems or ansatzes from prior self-work. The argument is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described or can be extracted.

pith-pipeline@v0.9.1-grok · 5742 in / 897 out tokens · 25677 ms · 2026-06-29T21:56:04.819434+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,

B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 3634–3640

2018
[2]

Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution,

F. Li, J. Feng, H. Yan, G. Jin, F. Yang, F. Sun, D. Jin, and Y . Li, “Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution,”ACM Transactions on Knowledge Discovery from Data, vol. 17, no. 1, pp. 1–21, 2023

2023
[3]

Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,

Z. Shao, Z. Zhang, W. Wei, F. Wang, Y . Xu, X. Cao, and C. S. Jensen, “Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,”arXiv preprint arXiv:2206.09112, 2022

work page arXiv 2022
[4]

Dmgstcn: Dynamic multigraph spatio–temporal convolution network for traffic forecasting,

Y . Qin, X. Tao, Y . Fang, H. Luo, F. Zhao, and C. Wang, “Dmgstcn: Dynamic multigraph spatio–temporal convolution network for traffic forecasting,”IEEE Internet of Things Journal, vol. 11, no. 12, pp. 22 208–22 219, 2024

2024
[5]

Localised adaptive spatial-temporal graph neural network,

W. Duan, X. He, Z. Zhou, L. Thiele, and H. Rao, “Localised adaptive spatial-temporal graph neural network,” inProceedings of the 29th acm sigkdd conference on knowledge discovery and data mining, 2023, pp. 448–458

2023
[6]

Connect- ing the dots: Multivariate time series forecasting with graph neural networks,

Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang, “Connect- ing the dots: Multivariate time series forecasting with graph neural networks,” inProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 753– 763

2020
[7]

Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow predic- tion,

J. Jiang, C. Han, W. X. Zhao, and J. Wang, “Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow predic- tion,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 4, 2023, pp. 4365–4373

2023
[8]

When spatio-temporal meet wavelets: Disentangled traffic forecasting via efficient spectral graph attention networks,

Y . Fang, Y . Qin, H. Luo, F. Zhao, B. Xu, L. Zeng, and C. Wang, “When spatio-temporal meet wavelets: Disentangled traffic forecasting via efficient spectral graph attention networks,” in2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023, pp. 517–529

2023
[9]

Diffusion convolutional recur- rent neural network: Data-driven traffic forecasting,

Y . Li, R. Yu, C. Shahabi, and Y . Liu, “Diffusion convolutional recur- rent neural network: Data-driven traffic forecasting,” inInternational Conference on Learning Representations, 2018

2018
[10]

Graph wavenet for deep spatial-temporal graph modeling,

Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph wavenet for deep spatial-temporal graph modeling,” inInternational Joint Conference on Artificial Intelligence 2019. Association for the Advancement of Artificial Intelligence (AAAI), 2019, pp. 1907–1913

2019
[11]

Duet: Dual clus- tering enhanced multivariate time series forecasting,

X. Qiu, X. Wu, Y . Lin, C. Guo, J. Hu, and B. Yang, “Duet: Dual clus- tering enhanced multivariate time series forecasting,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 1185–1196

2025
[12]

G. E. P. Box, G. M. Jenkins, G. C. Reinselet al.,Time series analysis: forecasting and control. John Wiley & Sons, 2015,arXiv:2311.10122

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Gman: A graph multi-attention network for traffic prediction,

C. Zheng, X. Fan, C. Wang, and J. Qi, “Gman: A graph multi-attention network for traffic prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 1234–1241

2020
[14]

Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,

H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 5668–5675

2019

[1] [1]

Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,

B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 3634–3640

2018

[2] [2]

Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution,

F. Li, J. Feng, H. Yan, G. Jin, F. Yang, F. Sun, D. Jin, and Y . Li, “Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution,”ACM Transactions on Knowledge Discovery from Data, vol. 17, no. 1, pp. 1–21, 2023

2023

[3] [3]

Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,

Z. Shao, Z. Zhang, W. Wei, F. Wang, Y . Xu, X. Cao, and C. S. Jensen, “Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,”arXiv preprint arXiv:2206.09112, 2022

work page arXiv 2022

[4] [4]

Dmgstcn: Dynamic multigraph spatio–temporal convolution network for traffic forecasting,

Y . Qin, X. Tao, Y . Fang, H. Luo, F. Zhao, and C. Wang, “Dmgstcn: Dynamic multigraph spatio–temporal convolution network for traffic forecasting,”IEEE Internet of Things Journal, vol. 11, no. 12, pp. 22 208–22 219, 2024

2024

[5] [5]

Localised adaptive spatial-temporal graph neural network,

W. Duan, X. He, Z. Zhou, L. Thiele, and H. Rao, “Localised adaptive spatial-temporal graph neural network,” inProceedings of the 29th acm sigkdd conference on knowledge discovery and data mining, 2023, pp. 448–458

2023

[6] [6]

Connect- ing the dots: Multivariate time series forecasting with graph neural networks,

Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang, “Connect- ing the dots: Multivariate time series forecasting with graph neural networks,” inProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 753– 763

2020

[7] [7]

Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow predic- tion,

J. Jiang, C. Han, W. X. Zhao, and J. Wang, “Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow predic- tion,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 4, 2023, pp. 4365–4373

2023

[8] [8]

When spatio-temporal meet wavelets: Disentangled traffic forecasting via efficient spectral graph attention networks,

Y . Fang, Y . Qin, H. Luo, F. Zhao, B. Xu, L. Zeng, and C. Wang, “When spatio-temporal meet wavelets: Disentangled traffic forecasting via efficient spectral graph attention networks,” in2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023, pp. 517–529

2023

[9] [9]

Diffusion convolutional recur- rent neural network: Data-driven traffic forecasting,

Y . Li, R. Yu, C. Shahabi, and Y . Liu, “Diffusion convolutional recur- rent neural network: Data-driven traffic forecasting,” inInternational Conference on Learning Representations, 2018

2018

[10] [10]

Graph wavenet for deep spatial-temporal graph modeling,

Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph wavenet for deep spatial-temporal graph modeling,” inInternational Joint Conference on Artificial Intelligence 2019. Association for the Advancement of Artificial Intelligence (AAAI), 2019, pp. 1907–1913

2019

[11] [11]

Duet: Dual clus- tering enhanced multivariate time series forecasting,

X. Qiu, X. Wu, Y . Lin, C. Guo, J. Hu, and B. Yang, “Duet: Dual clus- tering enhanced multivariate time series forecasting,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 1185–1196

2025

[12] [12]

G. E. P. Box, G. M. Jenkins, G. C. Reinselet al.,Time series analysis: forecasting and control. John Wiley & Sons, 2015,arXiv:2311.10122

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Gman: A graph multi-attention network for traffic prediction,

C. Zheng, X. Fan, C. Wang, and J. Qi, “Gman: A graph multi-attention network for traffic prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 1234–1241

2020

[14] [14]

Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,

H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 5668–5675

2019