MP3: Multi-Period Pattern Pre-training forSpatio-Temporal Forecasting

Chongshou Li; Lilan Peng; Qingren Yao; Tianrui Li; Yandi Liu

arxiv: 2606.13119 · v1 · pith:AOYVBU67new · submitted 2026-06-11 · 💻 cs.LG · cs.AI· cs.NE

MP3: Multi-Period Pattern Pre-training forSpatio-Temporal Forecasting

Lilan Peng , Yandi Liu , Qingren Yao , Chongshou Li , Tianrui Li This is my paper

Pith reviewed 2026-06-27 07:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NE

keywords spatio-temporal forecastingmulti-period patternspre-traininggraph neural networksplug-and-play moduletemporal cyclescausality modeling

0 comments

The pith

A plug-in pre-trains models on multi-period patterns from long series to resolve cases where similar short inputs produce divergent forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that spatio-temporal forecasting models often fail when short input windows hide the longer repeating cycles that shape future behavior. It introduces a pre-training step that learns those cycles separately in time, space, and across cycles, then attaches the learned patterns to existing graph-based forecasters. If the approach works, the same base models produce lower error on standard traffic, climate, and energy datasets without changing their core architecture. The claim matters because many real systems rely on forecasts that currently misread repeating weekly or daily structures as noise or coincidence.

Core claim

MP3 learns multi-period patterns by first applying edge convolution across long series to separate distinct temporal cycles, then using a bottleneck projection plus global memory bank to capture varying spatial relations at each cycle length, and finally running a causality-enhanced Transformer to model how one cycle pattern influences another. Once pre-trained, the resulting representations are inserted into any existing spatio-temporal graph network as a plug-in module. Experiments across five different base models and five datasets show that this insertion produces consistent error reductions.

What carries the argument

The MP3 plug-in, whose three components (edge-convolution temporal modeling, bottleneck-plus-memory-bank spatial modeling, and causality-enhanced Transformer for cross-period interaction) together extract and store repeating cycle patterns from long input series.

If this is right

Existing graph forecasters gain 4.7 percent lower MAE and 5.0 percent lower RMSE on average when the MP3 plug-in is added.
The same plug-in works without retraining the base model from scratch and scales to a large urban dataset.
The learned cycle patterns remain useful across different base architectures, showing the pre-training is not tied to one specific network design.
Cross-period dependencies captured by the Transformer component improve handling of superimposed cycle effects that short windows alone cannot resolve.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the cycle-pattern representations prove stable, they could be reused across cities or time periods without full retraining.
The same separation of temporal, spatial, and cross-cycle stages might apply to other sequence tasks where short contexts hide longer rhythms, such as energy load or epidemic curves.
A natural next test would be whether the memory bank can be updated online as new long series arrive rather than requiring a separate pre-training phase.

Load-bearing premise

That failures on similar short inputs arise mainly from missing longer cycle information rather than from other model limitations, and that the three new components supply exactly the missing information.

What would settle it

Attaching the pre-trained MP3 module to the five tested base models on the five datasets and observing no average error reduction or seeing gains disappear on the large-scale CA dataset.

read the original abstract

Spatio-Temporal forecasting is crucial in diverse fields, such as transportation, climate, and energy. Urban spatio-temporal data exhibits temporal mirage: similar short-window inputs have divergent future trends, and vice versa. Existing spatio-temporal graph neural networks (STGNNs) cannot effectively identify such mirages. We argue that the core reason lies in the short-window inputs that have incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality. To bridge this gap, we develop a novel Multi- Period Pattern Pre-training (MP3), a plug-and-play pre-training plugin for distinguishing temporal mirages. MP3 presents two core innovations: (1) The multi-period pattern learning is designed to learn multi-period patterns from long time series. Specifically, multi-period temporal modeling leverages edge convolution to identify different multi-period patterns. Multi-period spatial modeling uses a bottleneck project and a global memory bank to capture heterogeneous global spatial relations efficiently. Cross-period pattern interaction employs a causality-enhanced Transformer to capture dependencies across different period patterns. (2) This plugin can seamlessly integrate into existing STGNN backbones to strengthen their forecasting performance. The experiment on five STGNN baselines across five real-world datasets (including a large-scale dataset CA) verify the effectiveness, superior scalability and strong adaptability of MP3, which brings consistent and robust performance improvements across all evaluated baselines. On average, MP3 reduces the MAE 4.7% and the RMSE 5.0%. The code can be available at https://github.com/YAN-outlook/MP3.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MP3 adds a plug-in pre-training stage with edge-convolution temporal modeling, bottleneck-memory spatial modeling, and causality transformer cross-period interaction to STGNNs and reports gains on five baselines, but provides no ablations or diagnostics to tie those gains to the claimed mechanisms.

read the letter

MP3 is a pre-training plugin meant to help STGNNs handle temporal mirages by learning multi-period patterns from longer series. It uses edge convolution for temporal patterns, a bottleneck plus global memory bank for heterogeneous spatial relations, and a causality-enhanced transformer for cross-period dependencies. The claim is that these three pieces directly address incomplete period observation, heterogeneous global correlations, and cross-period superposition, and that the whole thing drops into existing backbones.

The paper does show the plugin tested on five different STGNN baselines across five datasets, including a large-scale one, with average MAE down 4.7% and RMSE down 5.0%. Code release is a plus for anyone who wants to check the implementation.

The soft spot is the missing verification that the components actually fix the stated problems. The abstract maps each module to one cause, but there are no ablations that keep capacity constant, no measurements of how much each cause shrinks after pre-training, and no checks on whether gains disappear when the cause is absent. Without those, the performance delta could just come from added parameters or longer context rather than the argued causal path. Experimental protocol details like splits, error bars, and statistical tests are also not visible in the provided summary.

This is for researchers already working on graph-based spatio-temporal forecasting who might want a modular pre-train step. It deserves peer review because the scope of testing is reasonable and the components are concrete, even if the mechanism story needs tightening.

Referee Report

3 major / 2 minor

Summary. The paper proposes MP3, a plug-and-play pre-training plugin for existing spatio-temporal graph neural networks (STGNNs) to address 'temporal mirages' where similar short-window inputs yield divergent forecasts. It identifies three causes—incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality—and introduces three corresponding components: multi-period temporal modeling via edge convolution, spatial modeling via bottleneck projection and global memory bank, and cross-period interaction via a causality-enhanced Transformer. The plugin integrates into STGNN backbones, and experiments across five baselines and five real-world datasets (including large-scale CA) report average MAE reductions of 4.7% and RMSE reductions of 5.0%, with claims of superior scalability and adaptability. Code is stated to be available.

Significance. If the empirical gains prove robust and mechanistically linked to the proposed components, MP3 could provide a general, reusable enhancement for STGNNs in domains like transportation and climate forecasting. The plug-and-play design and public code are positive features that would aid adoption and reproducibility if the central performance claims hold under scrutiny.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: the central claim of consistent 4.7% MAE / 5.0% RMSE reductions across five baselines and five datasets is presented without any reported details on data splits, cross-validation protocol, statistical significance tests, error bars, or controls for post-hoc hyperparameter choices, leaving the empirical support for the performance delta only weakly grounded.
[Method] Method section (description of the three components): the manuscript states that incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality are the primary drivers of temporal mirages and maps each MP3 component directly to one driver, yet contains no targeted diagnostics, component-wise ablations holding capacity fixed, or tests showing that gains vanish when the claimed cause is absent; aggregate performance numbers alone cannot establish the causal mechanism.
[Experiments] Experiments section: no analysis is provided on whether the observed improvements scale with the added parameters (convolution kernels, memory bank size, Transformer layers) or simply with longer context, which is required to rule out capacity or context-length explanations for the reported deltas.

minor comments (2)

[Abstract] The abstract mentions 'multi-period pattern learning' as one of two core innovations but the body text describes three components; clarifying the exact count and their grouping would improve readability.
[Method] Notation for the memory bank and bottleneck projection should be introduced with explicit equations or pseudocode in the method section to avoid ambiguity when integrating with different backbones.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and commit to revisions that will strengthen the empirical grounding and mechanistic analysis of MP3.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of consistent 4.7% MAE / 5.0% RMSE reductions across five baselines and five datasets is presented without any reported details on data splits, cross-validation protocol, statistical significance tests, error bars, or controls for post-hoc hyperparameter choices, leaving the empirical support for the performance delta only weakly grounded.

Authors: We agree that additional experimental details are needed. In the revised manuscript we will expand the Experiments section to specify the data splits, cross-validation protocol, statistical significance tests, error bars on all reported metrics, and the hyperparameter search procedure. These additions will be included in both the main text and supplementary material. revision: yes
Referee: [Method] Method section (description of the three components): the manuscript states that incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality are the primary drivers of temporal mirages and maps each MP3 component directly to one driver, yet contains no targeted diagnostics, component-wise ablations holding capacity fixed, or tests showing that gains vanish when the claimed cause is absent; aggregate performance numbers alone cannot establish the causal mechanism.

Authors: The three causes were identified through preliminary data analysis. We acknowledge that aggregate results alone are insufficient to establish causality. The revision will add component-wise ablations with matched parameter budgets and targeted diagnostics that isolate each driver, together with controls that remove the corresponding cause from the input data. revision: yes
Referee: [Experiments] Experiments section: no analysis is provided on whether the observed improvements scale with the added parameters (convolution kernels, memory bank size, Transformer layers) or simply with longer context, which is required to rule out capacity or context-length explanations for the reported deltas.

Authors: We will add experiments that compare MP3 against (i) baselines augmented with equivalent extra parameters and (ii) baselines given the same extended context length. These controls will be reported in the revised Experiments section to separate the contribution of MP3’s design from raw capacity or context effects. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical validation of a proposed architecture

full rationale

The paper introduces MP3 as a plug-and-play pre-training plugin with three explicitly designed components (edge-convolution temporal modeling, bottleneck+memory-bank spatial modeling, causality-enhanced Transformer) motivated by posited causes of temporal mirages. These are engineering choices and architectural decisions, not a derivation chain. No equations, predictions, or first-principles results are presented that reduce by construction to fitted inputs, self-definitions, or self-citation load-bearing uniqueness theorems. Performance improvements are reported via experiments on five baselines and five datasets; the central claim is therefore falsifiable by replication and does not collapse into tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that temporal mirage arises from the three listed data properties and that the architectural components address them; the empirical results further depend on the choice of five baselines and five datasets plus numerous neural-network hyperparameters.

free parameters (1)

model hyperparameters including convolution kernels, memory bank size, and transformer layers
These are chosen or fitted during pre-training and fine-tuning to achieve the reported gains.

axioms (1)

domain assumption Urban spatio-temporal data exhibits temporal mirage: similar short-window inputs have divergent future trends and vice versa.
Invoked in the first paragraph of the abstract as the core motivation.

pith-pipeline@v0.9.1-grok · 5829 in / 1316 out tokens · 42669 ms · 2026-06-27T07:15:49.908831+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references

[1]

S. A. Sayed, Y . Abdel-Hamid, H. A. Hefny, Artificial intelligence-based traffic flow prediction: a comprehensive review, Journal of Electrical Systems and Information Technology 10 (2023) 13

2023
[2]

Z. Li, C. Huang, L. Xia, Y . Xu, J. Pei, Spatial-temporal hypergraph self-supervised learning for crime prediction, in: IEEE 38th International Conference on Data Engineering, 2022, pp. 2984–2996

2022
[3]

K. H. Hettige, J. Ji, S. Xiang, C. Long, G. Cong, J. Wang, Airphynet: Harnessing physics-guided neural networks for air quality prediction, in: Proceedings of the 12th International Conference on Learning Representations, 2024, p. 1–17

2024
[4]

B. L. Smith, M. J. Demetsky, Traffic flow forecasting: Comparison of modeling approaches, Journal of Transportation Engineering (1997) 261–266

1997
[5]

O. D. Anderson, G. E. P. Box, G. M. Jenkins, Time series analysis: Forecasting and control, The Statistician (1978) 265

1978
[6]

Lippi, M

M. Lippi, M. Bertini, P. Frasconi, Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learn- ing, IEEE Transactions on Intelligent Transportation Systems (2013) 871–882

2013
[7]

L ¨utkepohl, New introduction to multiple time series analysis, Springer Berlin Heidelberg eBooks (Jan 2005)

H. L ¨utkepohl, New introduction to multiple time series analysis, Springer Berlin Heidelberg eBooks (Jan 2005)

2005
[8]

Zivot, J

E. Zivot, J. Wang, Vector autoregressive models for multivariate time series (2003) 369–413

2003
[9]

Zhang, Y

J. Zhang, Y . Zheng, D. Qi, Deep spatio-temporal residual networks for citywide crowd flows prediction, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, p. 1655–1661

2017
[10]

X. Ma, Z. Tao, Y . Wang, H. Yu, Y . Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor data, Transportation Research Part C: Emerging Technologies (2015) 187–197

2015
[11]

Hochreiter, J

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Com- putation (1997) 1735–1780

1997
[12]

Zhang, Y

J. Zhang, Y . Zheng, D. Qi, R. Li, X. Yi, Dnn-based prediction model for spatio-temporal data, in: Proceedings of the 24th ACM SIGSPA- TIAL International Conference on Advances in Geographic Information Systems, 2016

2016
[13]

Y . Lv, Y . Duan, W. Kang, Z. Li, F.-Y . Wang, Traffic flow prediction with big data: A deep learning approach, IEEE Transactions on Intelligent Transportation Systems (2014) 1–9

2014
[14]

G. Jin, Y . Liang, Y . Fang, Z. Shao, J. Huang, J. Zhang, Y . Zheng, Spatio-temporal graph neural networks for predictive learning in urban computing: A survey, IEEE Transactions on Knowledge and Data Engineering (2023) 1–20

2023
[15]

Y . Li, R. Yu, C. Shahabi, Y . Liu, Diffusion convolutional recurrent neural network: Data-driven traffic forecasting, in: International Conference on Learning Representations, 2018

2018
[16]

L. Zhao, Y . Song, C. Zhang, Y . Liu, P. Wang, T. Lin, M. Deng, H. Li, T- gcn: A temporal graph convolutional network for traffic prediction, IEEE Transactions on Intelligent Transportation Systems (2020) 3848–3858

2020
[17]

Z. Fang, Q. Long, G. Song, K. Xie, Spatial-temporal graph ode networks for traffic flow forecasting, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, 2021

2021
[18]

J. Ye, L. Sun, B. Du, Y . Fu, H. Xiong, Coupled layer-wise graph convolution for transportation demand prediction, Proceedings of the AAAI Conference on Artificial Intelligence (2022) 4617–4625

2022
[19]

C. Wang, K. Zhang, H. Wang, B. Chen, Auto-stgcn: Autonomous spatial- temporal graph convolutional network search, ACM Transactions on Knowledge Discovery from Data (2023) 1–21

2023
[20]

Zhang, J

Q. Zhang, J. Chang, G. Meng, S. Xiang, C. Pan, Spatio-temporal graph structure learning for traffic forecasting, Proceedings of the AAAI Conference on Artificial Intelligence (2020) 1177–1185

2020
[21]

J. Ye, Z. Liu, B. Du, L. Sun, W. Li, Y . Fu, H. Xiong, Learning the evolutionary and multi-scale graph structure for multivariate time series forecasting, in: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 2296–2306

2022
[22]

M. Ma, J. Hu, C. S. Jensen, F. Teng, P. Han, Z. Xu, T. Li, Learning time- aware graph structures for spatially correlated time series forecasting, in: 2024 IEEE 40th International Conference on Data Engineering (ICDE), 2024, pp. 4435–4448

2024
[23]

L. Bai, L. Yao, C. Li, X. Wang, C. Wang, Adaptive graph convolutional recurrent network for traffic forecasting, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020

2020
[24]

Z. Wu, S. Pan, G. Long, J. Jiang, C. Zhang, Graph wavenet for deep spatial-temporal graph modeling, in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2019
[25]

Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang, Connecting the dots: Multivariate time series forecasting with graph neural networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, 2020, p. 753–763

2020
[26]

Jiang, Z

R. Jiang, Z. Wang, J. Yong, P. Jeph, Q. Chen, Y . Kobayashi, X. Song, S. Fukushima, T. Suzumura, Spatio-temporal meta-graph learning for traffic forecasting, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 37, 2023, pp. 8078–8086

2023
[27]

Z. Dong, R. Jiang, H. Gao, H. Liu, J. Deng, Q. Wen, X. Song, Heterogeneity-informed meta-parameter learning for spatiotemporal time series forecasting, in: Proceedings of the 30th ACM SIGKDD JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14 Conference on Knowledge Discovery and Data Mining, 2024, pp. 631– 641

2021
[28]

Zheng, X

C. Zheng, X. Fan, C. Wang, J. Qi, Gman: A graph multi-attention network for traffic prediction, Proceedings of the AAAI Conference on Artificial Intelligence 34 (01) (2020) 1234–1241

2020
[29]

S. Guo, Y . Lin, L. Gong, C. Wang, Z. Zhou, Z. Shen, Y . Huang, H. Wan, Self-supervised spatial-temporal bottleneck attentive network for effi- cient long-term traffic forecasting, in: 2023 IEEE 39th International Conference on Data Engineering, 2023, pp. 1585–1596

2023
[30]

Jiang, C

J. Jiang, C. Han, W. X. Zhao, J. Wang, Pdformer: propagation delay- aware dynamic long-range transformer for traffic flow prediction, in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intel- ligence, 2023

2023
[31]

S. Guo, Y . Lin, H. Wan, X. Li, G. Cong, Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting, IEEE Transactions on Knowledge and Data Engineering 34 (11) (2022) 5415– 5428

2022
[32]

Liang, Y

Y . Liang, Y . Xia, S. Ke, Y . Wang, Q. Wen, J. Zhang, Y . Zheng, R. Zimmermann, Airformer: Predicting nationwide air quality in china with transformers, Proceedings of the AAAI Conference on Artificial Intelligence 37 (12) (2023) 14329–14337

2023
[33]

L. Cao, B. Wang, G. Jiang, Y . Yu, J. Dong, Spatiotemporal-aware trend-seasonality decomposition network for traffic flow forecasting, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, 2025, pp. 11463–11471

2025
[34]

Z. Pan, Y . Liang, W. Wang, Y . Yu, Y . Zheng, J. Zhang, Urban traffic prediction from spatio-temporal data using deep meta learning, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, 2019

2019
[35]

Z. Li, L. Xia, Y . Xu, C. Huang, Flashst: A simple and universal prompt- tuning framework for traffic prediction, in: Proceedings of the 41st International Conference on Machine Learning, ICML’24, 2024

2024
[36]

Z. Zhou, Q. Huang, K. Yang, K. Wang, X. Wang, Y . Zhang, Y . Liang, Y . Wang, Maintaining the status quo: Capturing invariant relations for ood spatiotemporal learning, in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, 2023, p. 3603–3614

2023
[37]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

2019
[38]

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000– 16009

2022
[39]

Z. Shao, Z. Zhang, F. Wang, Y . Xu, Pre-training enhanced spatial- temporal graph neural network for multivariate time series forecasting, in: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, 2022, pp. 1567–1577

2022
[40]

Z. Li, L. Xia, Y . Xu, C. Huang, Gpt-st: Generative pre-training of spatio- temporal graph neural networks, in: Advances in Neural Information Processing Systems, 2023, pp. 70229–70246

2023
[41]

H. Gao, R. Jiang, Z. Dong, J. Deng, Y . Ma, X. Song, Spatial-temporal- decoupled masked pre-training for spatiotemporal forecasting, in: Pro- ceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 3998–4006

2024
[42]

J. Wang, J. Jiang, W. Jiang, C. Li, W. X. Zhao, Libcity: An open library for traffic prediction, in: Proceedings of the 29th International Confer- ence on Advances in Geographic Information Systems, Association for Computing Machinery, 2021, p. 145–148

2021
[43]

Y . Cai, J. Xu, S. Jiao, Intelligent prediction of urban road network carry- ing capacity and traffic flow based on deep learning, IEEE Transactions on Vehicular Technology 74 (2) (2025) 2067–2079

2025
[44]

B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, 2018, p. 3634–3640

2018
[45]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: International Conference on Learning Rep- resentations (ICLR), 2017

2017
[46]

J. Deng, R. Jiang, J. Zhang, X. Song, Multi-modality spatio-temporal forecasting via self-supervised learning, in: K. Larson (Ed.), Proceed- ings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, International Joint Conferences on Artificial Intelligence Organization, 2024, pp. 2018–2026

2024
[47]

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, M. Long, Timesnet: Temporal 2d-variation modeling for general time series analysis, in: International Conference on Learning Representations, 2023

2023
[48]

W. Cai, Y . Liang, X. Liu, J. Feng, Y . Wu, Msgnet: learning multi- scale inter-series correlations for multivariate time series forecasting, in: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2024
[49]

H. Wang, J. Peng, F. Huang, J. Wang, J. Chen, Y . Xiao, Micn: Multi- scale local and global context modeling for long-term series forecasting (2023)

2023
[50]

J. Han, W. Zhang, H. Liu, T. Tao, N. Tan, H. Xiong, Bigst: Linear complexity spatio-temporal graph neural network for traffic forecasting on large-scale road networks, Proceedings of the VLDB Endowment 17 (5) (2024) 1081–1090

2024
[51]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9

2015
[52]

C. Song, Y . Lin, S. Guo, H. Wan, Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting, Proceedings of the AAAI Conference on Artificial Intelligence 34 (01) (2020) 914–921

2020
[53]

Liang, K

Y . Liang, K. Ouyang, Y . Wang, Z. Pan, Y . Yin, H. Chen, J. Zhang, Y . Zheng, D. S. Rosenblum, R. Zimmermann, Mixed-order relation- aware recurrent neural networks for spatio-temporal forecasting, IEEE Transactions on Knowledge and Data Engineering 35 (9) (2023) 9254– 9268

2023
[54]

Cirstea, B

R.-G. Cirstea, B. Yang, C. Guo, T. Kieu, S. Pan, Towards spatio- temporal aware traffic time series forecasting, in: 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 2900– 2913

2022
[55]

D. Liu, J. Wang, S. Shang, P. Han, Msdr: Multi-step dependency relation networks for spatial temporal forecasting, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, p. 1042–1050

2022
[56]

J. Deng, X. Chen, R. Jiang, X. Song, I. W. Tsang, St-norm: Spatial and temporal normalization for multi-variate time series forecasting, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, 2021, pp. 269–278

2021

[1] [1]

S. A. Sayed, Y . Abdel-Hamid, H. A. Hefny, Artificial intelligence-based traffic flow prediction: a comprehensive review, Journal of Electrical Systems and Information Technology 10 (2023) 13

2023

[2] [2]

Z. Li, C. Huang, L. Xia, Y . Xu, J. Pei, Spatial-temporal hypergraph self-supervised learning for crime prediction, in: IEEE 38th International Conference on Data Engineering, 2022, pp. 2984–2996

2022

[3] [3]

K. H. Hettige, J. Ji, S. Xiang, C. Long, G. Cong, J. Wang, Airphynet: Harnessing physics-guided neural networks for air quality prediction, in: Proceedings of the 12th International Conference on Learning Representations, 2024, p. 1–17

2024

[4] [4]

B. L. Smith, M. J. Demetsky, Traffic flow forecasting: Comparison of modeling approaches, Journal of Transportation Engineering (1997) 261–266

1997

[5] [5]

O. D. Anderson, G. E. P. Box, G. M. Jenkins, Time series analysis: Forecasting and control, The Statistician (1978) 265

1978

[6] [6]

Lippi, M

M. Lippi, M. Bertini, P. Frasconi, Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learn- ing, IEEE Transactions on Intelligent Transportation Systems (2013) 871–882

2013

[7] [7]

L ¨utkepohl, New introduction to multiple time series analysis, Springer Berlin Heidelberg eBooks (Jan 2005)

H. L ¨utkepohl, New introduction to multiple time series analysis, Springer Berlin Heidelberg eBooks (Jan 2005)

2005

[8] [8]

Zivot, J

E. Zivot, J. Wang, Vector autoregressive models for multivariate time series (2003) 369–413

2003

[9] [9]

Zhang, Y

J. Zhang, Y . Zheng, D. Qi, Deep spatio-temporal residual networks for citywide crowd flows prediction, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, p. 1655–1661

2017

[10] [10]

X. Ma, Z. Tao, Y . Wang, H. Yu, Y . Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor data, Transportation Research Part C: Emerging Technologies (2015) 187–197

2015

[11] [11]

Hochreiter, J

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Com- putation (1997) 1735–1780

1997

[12] [12]

Zhang, Y

J. Zhang, Y . Zheng, D. Qi, R. Li, X. Yi, Dnn-based prediction model for spatio-temporal data, in: Proceedings of the 24th ACM SIGSPA- TIAL International Conference on Advances in Geographic Information Systems, 2016

2016

[13] [13]

Y . Lv, Y . Duan, W. Kang, Z. Li, F.-Y . Wang, Traffic flow prediction with big data: A deep learning approach, IEEE Transactions on Intelligent Transportation Systems (2014) 1–9

2014

[14] [14]

G. Jin, Y . Liang, Y . Fang, Z. Shao, J. Huang, J. Zhang, Y . Zheng, Spatio-temporal graph neural networks for predictive learning in urban computing: A survey, IEEE Transactions on Knowledge and Data Engineering (2023) 1–20

2023

[15] [15]

Y . Li, R. Yu, C. Shahabi, Y . Liu, Diffusion convolutional recurrent neural network: Data-driven traffic forecasting, in: International Conference on Learning Representations, 2018

2018

[16] [16]

L. Zhao, Y . Song, C. Zhang, Y . Liu, P. Wang, T. Lin, M. Deng, H. Li, T- gcn: A temporal graph convolutional network for traffic prediction, IEEE Transactions on Intelligent Transportation Systems (2020) 3848–3858

2020

[17] [17]

Z. Fang, Q. Long, G. Song, K. Xie, Spatial-temporal graph ode networks for traffic flow forecasting, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, 2021

2021

[18] [18]

J. Ye, L. Sun, B. Du, Y . Fu, H. Xiong, Coupled layer-wise graph convolution for transportation demand prediction, Proceedings of the AAAI Conference on Artificial Intelligence (2022) 4617–4625

2022

[19] [19]

C. Wang, K. Zhang, H. Wang, B. Chen, Auto-stgcn: Autonomous spatial- temporal graph convolutional network search, ACM Transactions on Knowledge Discovery from Data (2023) 1–21

2023

[20] [20]

Zhang, J

Q. Zhang, J. Chang, G. Meng, S. Xiang, C. Pan, Spatio-temporal graph structure learning for traffic forecasting, Proceedings of the AAAI Conference on Artificial Intelligence (2020) 1177–1185

2020

[21] [21]

J. Ye, Z. Liu, B. Du, L. Sun, W. Li, Y . Fu, H. Xiong, Learning the evolutionary and multi-scale graph structure for multivariate time series forecasting, in: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 2296–2306

2022

[22] [22]

M. Ma, J. Hu, C. S. Jensen, F. Teng, P. Han, Z. Xu, T. Li, Learning time- aware graph structures for spatially correlated time series forecasting, in: 2024 IEEE 40th International Conference on Data Engineering (ICDE), 2024, pp. 4435–4448

2024

[23] [23]

L. Bai, L. Yao, C. Li, X. Wang, C. Wang, Adaptive graph convolutional recurrent network for traffic forecasting, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020

2020

[24] [24]

Z. Wu, S. Pan, G. Long, J. Jiang, C. Zhang, Graph wavenet for deep spatial-temporal graph modeling, in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2019

[25] [25]

Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang, Connecting the dots: Multivariate time series forecasting with graph neural networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, 2020, p. 753–763

2020

[26] [26]

Jiang, Z

R. Jiang, Z. Wang, J. Yong, P. Jeph, Q. Chen, Y . Kobayashi, X. Song, S. Fukushima, T. Suzumura, Spatio-temporal meta-graph learning for traffic forecasting, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 37, 2023, pp. 8078–8086

2023

[27] [27]

Z. Dong, R. Jiang, H. Gao, H. Liu, J. Deng, Q. Wen, X. Song, Heterogeneity-informed meta-parameter learning for spatiotemporal time series forecasting, in: Proceedings of the 30th ACM SIGKDD JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14 Conference on Knowledge Discovery and Data Mining, 2024, pp. 631– 641

2021

[28] [28]

Zheng, X

C. Zheng, X. Fan, C. Wang, J. Qi, Gman: A graph multi-attention network for traffic prediction, Proceedings of the AAAI Conference on Artificial Intelligence 34 (01) (2020) 1234–1241

2020

[29] [29]

S. Guo, Y . Lin, L. Gong, C. Wang, Z. Zhou, Z. Shen, Y . Huang, H. Wan, Self-supervised spatial-temporal bottleneck attentive network for effi- cient long-term traffic forecasting, in: 2023 IEEE 39th International Conference on Data Engineering, 2023, pp. 1585–1596

2023

[30] [30]

Jiang, C

J. Jiang, C. Han, W. X. Zhao, J. Wang, Pdformer: propagation delay- aware dynamic long-range transformer for traffic flow prediction, in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intel- ligence, 2023

2023

[31] [31]

S. Guo, Y . Lin, H. Wan, X. Li, G. Cong, Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting, IEEE Transactions on Knowledge and Data Engineering 34 (11) (2022) 5415– 5428

2022

[32] [32]

Liang, Y

Y . Liang, Y . Xia, S. Ke, Y . Wang, Q. Wen, J. Zhang, Y . Zheng, R. Zimmermann, Airformer: Predicting nationwide air quality in china with transformers, Proceedings of the AAAI Conference on Artificial Intelligence 37 (12) (2023) 14329–14337

2023

[33] [33]

L. Cao, B. Wang, G. Jiang, Y . Yu, J. Dong, Spatiotemporal-aware trend-seasonality decomposition network for traffic flow forecasting, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, 2025, pp. 11463–11471

2025

[34] [34]

Z. Pan, Y . Liang, W. Wang, Y . Yu, Y . Zheng, J. Zhang, Urban traffic prediction from spatio-temporal data using deep meta learning, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, 2019

2019

[35] [35]

Z. Li, L. Xia, Y . Xu, C. Huang, Flashst: A simple and universal prompt- tuning framework for traffic prediction, in: Proceedings of the 41st International Conference on Machine Learning, ICML’24, 2024

2024

[36] [36]

Z. Zhou, Q. Huang, K. Yang, K. Wang, X. Wang, Y . Zhang, Y . Liang, Y . Wang, Maintaining the status quo: Capturing invariant relations for ood spatiotemporal learning, in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, 2023, p. 3603–3614

2023

[37] [37]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

2019

[38] [38]

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000– 16009

2022

[39] [39]

Z. Shao, Z. Zhang, F. Wang, Y . Xu, Pre-training enhanced spatial- temporal graph neural network for multivariate time series forecasting, in: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, 2022, pp. 1567–1577

2022

[40] [40]

Z. Li, L. Xia, Y . Xu, C. Huang, Gpt-st: Generative pre-training of spatio- temporal graph neural networks, in: Advances in Neural Information Processing Systems, 2023, pp. 70229–70246

2023

[41] [41]

H. Gao, R. Jiang, Z. Dong, J. Deng, Y . Ma, X. Song, Spatial-temporal- decoupled masked pre-training for spatiotemporal forecasting, in: Pro- ceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 3998–4006

2024

[42] [42]

J. Wang, J. Jiang, W. Jiang, C. Li, W. X. Zhao, Libcity: An open library for traffic prediction, in: Proceedings of the 29th International Confer- ence on Advances in Geographic Information Systems, Association for Computing Machinery, 2021, p. 145–148

2021

[43] [43]

Y . Cai, J. Xu, S. Jiao, Intelligent prediction of urban road network carry- ing capacity and traffic flow based on deep learning, IEEE Transactions on Vehicular Technology 74 (2) (2025) 2067–2079

2025

[44] [44]

B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, 2018, p. 3634–3640

2018

[45] [45]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: International Conference on Learning Rep- resentations (ICLR), 2017

2017

[46] [46]

J. Deng, R. Jiang, J. Zhang, X. Song, Multi-modality spatio-temporal forecasting via self-supervised learning, in: K. Larson (Ed.), Proceed- ings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, International Joint Conferences on Artificial Intelligence Organization, 2024, pp. 2018–2026

2024

[47] [47]

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, M. Long, Timesnet: Temporal 2d-variation modeling for general time series analysis, in: International Conference on Learning Representations, 2023

2023

[48] [48]

W. Cai, Y . Liang, X. Liu, J. Feng, Y . Wu, Msgnet: learning multi- scale inter-series correlations for multivariate time series forecasting, in: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2024

[49] [49]

H. Wang, J. Peng, F. Huang, J. Wang, J. Chen, Y . Xiao, Micn: Multi- scale local and global context modeling for long-term series forecasting (2023)

2023

[50] [50]

J. Han, W. Zhang, H. Liu, T. Tao, N. Tan, H. Xiong, Bigst: Linear complexity spatio-temporal graph neural network for traffic forecasting on large-scale road networks, Proceedings of the VLDB Endowment 17 (5) (2024) 1081–1090

2024

[51] [51]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9

2015

[52] [52]

C. Song, Y . Lin, S. Guo, H. Wan, Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting, Proceedings of the AAAI Conference on Artificial Intelligence 34 (01) (2020) 914–921

2020

[53] [53]

Liang, K

Y . Liang, K. Ouyang, Y . Wang, Z. Pan, Y . Yin, H. Chen, J. Zhang, Y . Zheng, D. S. Rosenblum, R. Zimmermann, Mixed-order relation- aware recurrent neural networks for spatio-temporal forecasting, IEEE Transactions on Knowledge and Data Engineering 35 (9) (2023) 9254– 9268

2023

[54] [54]

Cirstea, B

R.-G. Cirstea, B. Yang, C. Guo, T. Kieu, S. Pan, Towards spatio- temporal aware traffic time series forecasting, in: 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 2900– 2913

2022

[55] [55]

D. Liu, J. Wang, S. Shang, P. Han, Msdr: Multi-step dependency relation networks for spatial temporal forecasting, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, p. 1042–1050

2022

[56] [56]

J. Deng, X. Chen, R. Jiang, X. Song, I. W. Tsang, St-norm: Spatial and temporal normalization for multi-variate time series forecasting, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, 2021, pp. 269–278

2021