pith. machine review for the scientific record. sign in

arxiv: 2604.16084 · v1 · submitted 2026-04-17 · 💻 cs.LG · cs.AI

Recognition: unknown

Unveiling Stochasticity: Universal Multi-modal Probabilistic Modeling for Traffic Forecasting

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords traffic forecastingprobabilistic modelingGaussian mixture modeluncertainty quantificationmulti-modal predictionstochastic trafficspatio-temporal modeling
0
0 comments X

The pith

Replacing only the final layer with a Gaussian mixture model turns any traffic forecaster into a multi-modal probabilistic model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that traffic forecasting models can be converted to probabilistic predictors simply by swapping their output layer for a Gaussian Mixture Model, trained end-to-end with negative log-likelihood loss and no other changes. A reader would care because real traffic contains multiple possible futures and inherent uncertainty, yet most existing methods output only single deterministic values that ignore this variability. The approach is shown to preserve point-prediction accuracy while delivering better-calibrated uncertainty estimates across classic and modern architectures and real urban datasets.

Core claim

Existing traffic forecasting architectures can be transformed into universal multi-modal probabilistic models by replacing only the final output layer with a Gaussian Mixture Model layer that is trained using solely the Negative Log-Likelihood loss, without auxiliary terms, regularization, or alterations to the internal model structure or training pipeline.

What carries the argument

The Gaussian Mixture Model output layer, which models each prediction as a weighted sum of Gaussian distributions to represent multiple possible traffic outcomes.

If this is right

  • The modified models match or exceed the original deterministic performance on point predictions.
  • Systematic checks using cumulative distribution functions and confidence intervals show the probabilistic outputs are more accurate and informative than unimodal baselines.
  • The method remains effective on real-world dense urban networks even when input data quality is imperfect.
  • The same output-layer replacement works for both classic and modern forecasting architectures without further tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same minimal change might be tried on other spatio-temporal forecasting problems such as demand or weather to test whether internal model adjustments are generally unnecessary for capturing stochasticity.
  • In deployed traffic systems the resulting probability distributions could directly support risk-aware routing or signal control decisions.
  • If the GMM layer alone suffices, then future model design can focus on representation learning while treating output multimodality as a separable, lightweight addition.

Load-bearing premise

Traffic dynamics are adequately captured by a mixture of Gaussians attached only at the model's final output, without any need to change how the network learns its internal representations.

What would settle it

On a held-out traffic dataset, if the GMM-modified models produce higher negative log-likelihood values or confidence intervals that are either too narrow or miscalibrated relative to observed traffic variations compared with the original deterministic models, the central claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.16084 by Nikolas Geroliminis, Robert Fonod, Weijiang Xiong.

Figure 1
Figure 1. Figure 1: Multi-modal data distribution and various predictions [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Transforming a deterministic model (left) to a probabilistic predictor (middle) [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean value formulation and data range where zk is the output of the linear layer in the mixing-coefficient branch for the k-th Gaussian component. Following Kendall and Gal (2017), the GMM layer predicts log-variance instead of the variance itself for better numerical stability, i.e., log (σ 2 ) ∈ R N×T ×K. The mean values µ ∈ R N×T ×K are re￾parameterized by the predicted relative offsets together with pr… view at source ↗
Figure 4
Figure 4. Figure 4: Example GMM Confidence interval Distribution Function (CDF) F and an observed value y: CRPS(F, y) = Z ∞ −∞ (F(x) − H(x − y))2 dx, (11) where H is the Heaviside step function, i.e., a step function jumping from 0 to 1 at the label value y. We interpret a deterministic prediction as a Dirac Delta function. Therefore, F(x) reduces to a step CDF at the predicted value, and the CRPS score becomes the absolute e… view at source ↗
Figure 5
Figure 5. Figure 5: Structure of the LGC model For each of these models, three variants are considered: a deterministic variant (Det), a Normal variant (Norm) predicting a Gaussian distribution, and a GMM variant with the adaptation method in [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: CRPS score of all methods at all prediction horizons on three datasets. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: mAW score of all probabilistic methods at all prediction horizons on three [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Calibration curve of all probabilistic methods at different confidence interval [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: RMSE of all methods at all prediction horizons on three datasets. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Test set data distribution in SimBarcaSpd. The data from 26 test set sessions [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The 30-minute-ahead predictions (10 steps) of LGC (GMM) for a high-traffic [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: One, five, and ten step predictions of LGC (GMM) for road segment 806 in [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: CRPS scores of LGC variants on SimBarca under different data quality settings. [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
read the original abstract

Traffic forecasting is a challenging spatio-temporal modeling task and a critical component of urban transportation management. Current studies mainly focus on deterministic predictions, with limited considerations on the uncertainty and stochasticity in traffic dynamics. Therefore, this paper proposes an elegant yet universal approach that transforms existing models into probabilistic predictors by replacing only the final output layer with a novel Gaussian Mixture Model (GMM) layer. The modified model requires no changes to the training pipeline and can be trained using only the Negative Log-Likelihood (NLL) loss, without any auxiliary or regularization terms. Experiments on multiple traffic datasets show that our approach generalizes from classic to modern model architectures while preserving deterministic performance. Furthermore, we propose a systematic evaluation procedure based on cumulative distributions and confidence intervals, and demonstrate that our approach is considerably more accurate and informative than unimodal or deterministic baselines. Finally, a more detailed study on a real-world dense urban traffic network is presented to examine the impact of data quality on uncertainty quantification and to show the robustness of our approach under imperfect data conditions. Code available at https://github.com/Weijiang-Xiong/OpenSkyTraffic

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes replacing the final output layer of existing deterministic traffic forecasting models with a Gaussian Mixture Model (GMM) layer, enabling probabilistic multi-modal predictions. The modified models require no changes to the internal architecture or training pipeline and are trained end-to-end using only Negative Log-Likelihood (NLL) loss. Experiments across multiple traffic datasets and model architectures (classic to modern) claim to preserve deterministic performance while improving accuracy and informativeness over unimodal or deterministic baselines. A new evaluation procedure based on cumulative distributions and confidence intervals is introduced, and a case study on a real-world dense urban network examines robustness under imperfect data conditions. Code is released publicly.

Significance. If the empirical results hold, the approach provides a simple, universal retrofit for adding multi-modal uncertainty quantification to traffic forecasting without retraining pipelines or auxiliary losses, which could be valuable for urban management applications. The public code release and focus on a systematic evaluation procedure beyond point estimates are strengths for reproducibility and field progress. However, the absence of theoretical justification for why backbone features suffice for mode capture without internal modifications or regularization limits the conceptual contribution.

major comments (3)
  1. [Method and Experiments] The central claim that traffic stochasticity is adequately captured by a GMM output layer alone (with NLL loss and no internal changes or regularization) rests on the unverified assumption that jointly trained backbone features already encode distinct modes. This is load-bearing for the 'universal' and 'no changes' assertions, yet the manuscript provides no ablations on feature collapse, mode separation, or variance calibration (e.g., in the method or experiments sections).
  2. [Experiments] The number of mixture components is listed as a free parameter, yet the paper emphasizes generalization 'without changes to the training pipeline.' Clarify the selection procedure across datasets and architectures, and report sensitivity analysis showing that results do not hinge on this choice (particularly in the real-world urban network study).
  3. [Evaluation Procedure] The proposed evaluation procedure using cumulative distributions and confidence intervals is presented as systematic, but without direct comparison to standard probabilistic metrics (e.g., CRPS, PICP, or NLL on held-out sets) in the results tables, it is unclear whether it demonstrates superiority beyond the reported accuracy gains.
minor comments (2)
  1. [Abstract] The abstract states that the approach 'preserves deterministic performance,' but without explicit side-by-side metrics (e.g., MAE/RMSE tables comparing GMM vs. original deterministic heads) this claim is difficult to assess quantitatively.
  2. [Method] Notation for the GMM layer (e.g., how mixture weights, means, and variances are parameterized and constrained to be positive) should be formalized with equations to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to clarify and strengthen our manuscript. We address each major comment point by point below, providing honest responses based on the current work and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Method and Experiments] The central claim that traffic stochasticity is adequately captured by a GMM output layer alone (with NLL loss and no internal changes or regularization) rests on the unverified assumption that jointly trained backbone features already encode distinct modes. This is load-bearing for the 'universal' and 'no changes' assertions, yet the manuscript provides no ablations on feature collapse, mode separation, or variance calibration (e.g., in the method or experiments sections).

    Authors: We acknowledge that the manuscript lacks explicit ablations on feature collapse, mode separation, or variance calibration, which would provide stronger support for the assumption that backbone features suffice. The consistent empirical gains across diverse architectures and datasets under joint NLL training provide indirect evidence that the features capture multi-modal structure in traffic data. To address this directly, we will add ablation studies in the revised experiments section, including comparisons of joint vs. frozen backbone training, feature visualizations (e.g., t-SNE) demonstrating mode separation, and calibration plots comparing predicted and empirical coverage. revision: yes

  2. Referee: [Experiments] The number of mixture components is listed as a free parameter, yet the paper emphasizes generalization 'without changes to the training pipeline.' Clarify the selection procedure across datasets and architectures, and report sensitivity analysis showing that results do not hinge on this choice (particularly in the real-world urban network study).

    Authors: K was fixed at 3 after preliminary validation on one dataset to ensure a uniform setting with no per-model or per-dataset tuning, preserving the minimal-change pipeline. We will clarify this selection procedure in the methods section. In the revision, we will include a sensitivity analysis table in the experiments section (and specifically for the urban network study) reporting performance for K=2 to 5, showing that results remain stable and do not critically depend on this choice within a practical range. revision: yes

  3. Referee: [Evaluation Procedure] The proposed evaluation procedure using cumulative distributions and confidence intervals is presented as systematic, but without direct comparison to standard probabilistic metrics (e.g., CRPS, PICP, or NLL on held-out sets) in the results tables, it is unclear whether it demonstrates superiority beyond the reported accuracy gains.

    Authors: The CDF- and CI-based procedure is intended to provide intuitive, visual insight into multi-modality and uncertainty specific to traffic regimes. We agree that direct comparisons to standard metrics would improve context. In the revised results tables, we will add CRPS and PICP values alongside our metrics, and report explicit held-out NLL (separate from training loss) for all methods to enable straightforward benchmarking against other probabilistic approaches. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical replacement of output layer validated on external datasets

full rationale

The paper advances an empirical modeling technique—replacing only the final layer of existing architectures with a GMM and training end-to-end solely with NLL—without any claimed first-principles derivation or mathematical chain. Central claims rest on generalization experiments across multiple traffic datasets and comparison to baselines, which are independent of the method's internal definitions. No equations reduce a prediction to a fitted parameter by construction, no self-citation load-bears the core premise, and no ansatz or uniqueness theorem is invoked. The approach is self-contained against external benchmarks, yielding a normal non-finding of circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach assumes traffic observations can be represented as a mixture of Gaussians at the final layer and that negative log-likelihood alone suffices to learn useful parameters without auxiliary losses or internal model changes.

free parameters (1)
  • number of mixture components
    The GMM layer requires choosing K, the number of Gaussians; this hyperparameter is fitted or selected per dataset.
axioms (1)
  • domain assumption Traffic dynamics admit a useful multi-modal Gaussian representation at the output of existing spatio-temporal encoders.
    Invoked by the decision to replace only the final layer with GMM and train with NLL.

pith-pipeline@v0.9.0 · 5498 in / 1309 out tokens · 54323 ms · 2026-05-10T08:11:10.098775+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    , author Li, Z

    author An, Y. , author Li, Z. , author Li, X. , author Liu, W. , author Yang, X. , author Sun, H. , author Chen, M. , author Zheng, Y. , author Gong, Y. , year 2025 . title Spatio-temporal multivariate probabilistic modeling for traffic prediction . journal IEEE Transactions on Knowledge and Data Engineering volume 37 , pages 2986--3000

  2. [2]

    , author Trindade, A

    author Bessa, R.J. , author Trindade, A. , author Miranda, V. , year 2014 . title Spatial-temporal solar power forecasting for smart grids . journal IEEE Transactions on Industrial Informatics volume 11 , pages 232--241

  3. [3]

    , author Talagrand, O

    author Candille, G. , author Talagrand, O. , year 2005 . title Evaluation of probabilistic prediction systems for a scalar variable . journal Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography volume 131 , pages 2131--2150

  4. [4]

    , author Chen, R

    author Chen, X. , author Chen, R. , year 2019 . title A review on traffic prediction methods for intelligent transportation system in smart cities , in: booktitle 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) , organization IEEE . pp. pages 1--5

  5. [5]

    , author Wang, Z

    author Coric, V. , author Wang, Z. , author Vucetic, S. , year 2011 . title Traffic speed forecasting by mixture of experts , in: booktitle 2011 14th International IEEE Conference on Intelligent Transportation Systems , organization IEEE

  6. [6]

    , author Levinson, D

    author Ermagun, A. , author Levinson, D. , year 2018 . title Spatiotemporal traffic forecasting: review and proposed directions . journal Transport Reviews volume 38 , pages 786--814

  7. [7]

    , author Ravazzi, C

    author Ferreira, G.O. , author Ravazzi, C. , author Dabbene, F. , author Calafiore, G.C. , author Fiore, M. , year 2023 . title Forecasting network traffic: A survey and tutorial with open-source comparative evaluation . journal IEEE Access volume 11 , pages 6018--6044

  8. [8]

    , author Zhou, W

    author Fu, J. , author Zhou, W. , author Chen, Z. , year 2021 . title Bayesian spatio-temporal graph convolutional network for traffic forecasting , in: booktitle Proceedings of the AAAI Conference on Artificial Intelligence , organization AAAI

  9. [9]

    , author Ghahramani, Z

    author Gal, Y. , author Ghahramani, Z. , year 2016 . title Dropout as a bayesian approximation: Representing model uncertainty in deep learning , in: booktitle international conference on machine learning , organization PMLR . pp. pages 1050--1059

  10. [10]

    , author Carlin, J.B

    author Gelman, A. , author Carlin, J.B. , author Stern, H.S. , author Rubin, D.B. , year 1995 . title Bayesian data analysis . publisher Chapman and Hall/CRC

  11. [11]

    , author Basu, B

    author Ghosh, B. , author Basu, B. , author O'Mahony, M. , year 2010 . title Random process model for urban traffic flow using a wavelet-bayesian hierarchical technique . journal Computer-Aided Civil and Infrastructure Engineering volume 25 , pages 613--624

  12. [12]

    , author Raftery, A.E

    author Gneiting, T. , author Raftery, A.E. , year 2007 . title Strictly proper scoring rules, prediction, and estimation . journal Journal of the American statistical Association volume 102 , pages 359--378

  13. [13]

    , author Pleiss, G

    author Guo, C. , author Pleiss, G. , author Sun, Y. , author Weinberger, K.Q. , year 2017 . title On calibration of modern neural networks , in: booktitle International conference on machine learning , organization PMLR . pp. pages 1321--1330

  14. [14]

    , author Lin, Y

    author Guo, S. , author Lin, Y. , author Feng, N. , author Song, C. , author Wan, H. , year 2019 . title Attention based spatial-temporal graph convolutional networks for traffic flow forecasting , in: booktitle Proceedings of the AAAI conference on artificial intelligence , pp. pages 922--929

  15. [15]

    , author Schmidhuber, J

    author Hochreiter, S. , author Schmidhuber, J. , year 1997 . title Long short-term memory . journal Neural Computation volume 9 , pages 1735--1780

  16. [16]

    , author Luo, J

    author Jiang, W. , author Luo, J. , year 2022 . title Graph neural network for traffic forecasting: A survey . journal Expert systems with applications volume 207 , pages 117921

  17. [17]

    , author Gal, Y

    author Kendall, A. , author Gal, Y. , year 2017 . title What uncertainties do we need in bayesian deep learning for computer vision? journal Advances in neural information processing systems volume 30

  18. [18]

    Semi-Supervised Classification with Graph Convolutional Networks

    author Kipf, T.N. , author Welling, M. , year 2016 . title Semi-supervised classification with graph convolutional networks . journal arXiv preprint arXiv:1609.02907

  19. [19]

    , author Geroliminis, N

    author Kwak, S. , author Geroliminis, N. , author Frossard, P. , year 2021 . title Traffic signal prediction on transportation networks using spatio-temporal correlations on graphs . journal IEEE Transactions on Signal and Information Processing over Networks volume 7 , pages 648--659 . :10.1109/TSIPN.2021.3118489

  20. [20]

    , author Zhang, F

    author Li, R. , author Zhang, F. , author Li, T. , author Zhang, N. , author Zhang, T. , year 2022 . title Dmgan: Dynamic multi-hop graph attention network for traffic forecasting . journal IEEE Transactions on Knowledge and Data Engineering volume 35 , pages 9088--9101

  21. [21]

    , author Yu, R

    author Li, Y. , author Yu, R. , author Shahabi, C. , author Liu, Y. , year 2018 . title Diffusion convolutional recurrent neural network: Data-driven traffic forecasting , in: booktitle International Conference on Learning Representations (ICLR '18)

  22. [22]

    , author Zohren, S

    author Lim, B. , author Zohren, S. , year 2021 . title Time-series forecasting with deep learning: a survey . journal Philosophical transactions of the royal society a: mathematical, physical and engineering sciences volume 379

  23. [23]

    , author Dong, Z

    author Liu, H. , author Dong, Z. , author Jiang, R. , author Deng, J. , author Deng, J. , author Chen, Q. , author Song, X. , year 2023 . title Spatio-temporal adaptive embedding makes vanilla transformer sota for traffic forecasting , in: booktitle Proceedings of the 32nd ACM international conference on information and knowledge management , pp. pages 4125--4129

  24. [24]

    , author Wang, Y

    author Liu, J. , author Wang, Y. , author Zhu, J. , author Bai, W. , author Zhang, H. , author Zuo, L. , author Zhou, T. , author Li, K. , year 2025 . title A multilayer spatiotemporal correlation-aware graph attention network for traffic flow prediction . journal IEEE Transactions on Neural Networks and Learning Systems

  25. [25]

    , author Hutter, F

    author Loshchilov, I. , author Hutter, F. , year 2019 . title Decoupled weight decay regularization , in: booktitle International Conference on Learning Representations . https://openreview.net/forum?id=Bkg6RiCqY7

  26. [26]

    , year 2020

    author Maclean, I.M. , year 2020 . title Predicting future climate at high spatial and temporal resolution . journal Global change biology volume 26 , pages 1003--1011

  27. [27]

    , et al., year 2015

    author Mauro, R. , et al., year 2015 . title Traffic and random processes . journal Trento: Springer International Publishing Switzerland. DOI: https://doi. org/10.1007/978-3-319-09324-6

  28. [28]

    , author Nielsen, T.D

    author Qian, W. , author Nielsen, T.D. , author Zhao, Y. , author Larsen, K.G. , author Yu, J.J. , year 2024 . title Uncertainty-aware temporal graph convolutional network for traffic speed forecasting . journal IEEE Transactions on Intelligent Transportation Systems volume 25 , pages 8578--8591

  29. [29]

    , author Zhao, Y

    author Qian, W. , author Zhao, Y. , author Zhang, D. , author Chen, B. , author Zheng, K. , author Zhou, X. , year 2023 . title Towards a unified understanding of uncertainty quantification in traffic flow forecasting . journal IEEE Transactions on Knowledge and Data Engineering volume 36 , pages 2239--2256

  30. [30]

    , author Geroliminis, N

    author Ramezani, M. , author Geroliminis, N. , year 2012 . title On the estimation of arterial route travel time distribution with markov chains . journal Transportation Research Part B: Methodological volume 46 , pages 1576--1590

  31. [31]

    , author Swamy, S.R

    author Ravish, R. , author Swamy, S.R. , year 2021 . title Intelligent traffic management: A review of challenges, solutions, and future perspectives . journal Transport and Telecommunication volume 22 , pages 163--182

  32. [32]

    , author Mondal, S

    author Sengupta, A. , author Mondal, S. , author Das, A. , author Guler, S.I. , year 2024 . title A bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models . journal Transportation Research Part C: Emerging Technologies volume 162 , pages 104585

  33. [33]

    , author Zhang, Z

    author Shao, Z. , author Zhang, Z. , author Wang, F. , author Wei, W. , author Xu, Y. , year 2022 . title Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting , in: booktitle Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pp. pages 4454--4458

  34. [34]

    , author Ye, M

    author Tan, Q. , author Ye, M. , author Ma, A.J. , author Yang, B. , author Yip, T.C.F. , author Wong, G.L.H. , author Yuen, P.C. , year 2020 . title Explainable uncertainty-aware convolutional recurrent neural network for irregular medical time series . journal IEEE Transactions on Neural Networks and Learning Systems volume 32 , pages 4665--4679

  35. [35]

    , author Matteson, D.S

    author Tang, B. , author Matteson, D.S. , year 2021 . title Probabilistic transformer for time series analysis . journal Advances in neural information processing systems volume 34 , pages 23592--23608

  36. [36]

    , author Bao, Z

    author Tedjopurnomo, D.A. , author Bao, Z. , author Zheng, B. , author Choudhury, F.M. , author Qin, A.K. , year 2022 . title A survey on modern deep neural network for traffic prediction: Trends, methods and challenges . journal IEEE Transactions on Knowledge and Data Engineering volume 34 , pages 1544--1561 . :10.1109/TKDE.2020.3001195

  37. [37]

    , author Kouvelas, A

    author Tsitsokas, D. , author Kouvelas, A. , author Geroliminis, N. , year 2023 . title Two-layer adaptive signal control framework for large-scale dynamically-congested networks: Combining efficient max pressure with perimeter control . journal Transportation Research Part C: Emerging Technologies volume 152 , pages 104128

  38. [38]

    , author Shazeer, N

    author Vaswani, A. , author Shazeer, N. , author Parmar, N. , author Uszkoreit, J. , author Jones, L. , author Gomez, A.N. , author Kaiser, . , author Polosukhin, I. , year 2017 . title Attention is all you need . journal Advances in Neural Information Processing Systems volume 30

  39. [39]

    , author Karlaftis, M.G

    author Vlahogianni, E.I. , author Karlaftis, M.G. , author Golias, J.C. , year 2014 . title Short-term traffic forecasting: Where we are and where we're going . journal Transportation Research Part C: Emerging Technologies volume 43 , pages 3--19

  40. [40]

    , author Yang, X

    author Wang, Y. , author Yang, X. , author Liang, H. , author Liu, Y. , year 2018 . title A review of the self-adaptive traffic signal control system based on future traffic environment . journal Journal of Advanced Transportation volume 2018 , pages 1096123

  41. [41]

    , author Mulgund, S

    author Wanke, C. , author Mulgund, S. , author Greenbaum, D. , author Song, L. , year 2004 . title Modeling traffic prediction uncertainty for traffic management decision support , in: booktitle AIAA Guidance, Navigation, and Control Conference and Exhibit , p. pages 5230

  42. [42]

    , author James, J

    author Wu, Y. , author James, J. , year 2021 . title A bayesian learning network for traffic speed forecasting with uncertainty quantification , in: booktitle 2021 International Joint Conference on Neural Networks (IJCNN) , organization IEEE . pp. pages 1--7

  43. [43]

    , author Ye, Y

    author Wu, Y. , author Ye, Y. , author Zeb, A. , author Yu, J.J. , author Wang, Z. , year 2023 . title Adaptive modeling of uncertainties for traffic forecasting . journal arXiv preprint arXiv:2303.09273

  44. [44]

    , author Pan, S

    author Wu, Z. , author Pan, S. , author Long, G. , author Jiang, J. , author Chang, X. , author Zhang, C. , year 2020 . title Connecting the dots: Multivariate time series forecasting with graph neural networks , in: booktitle Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pp. pages 753--763

  45. [45]

    Graph wavenet for deep spatial-temporal graph model- ing.arXiv preprint arXiv:1906.00121,

    author Wu, Z. , author Pan, S. , author Long, G. , author Jiang, J. , author Zhang, C. , year 2019 . title Graph wavenet for deep spatial-temporal graph modeling . journal arXiv preprint arXiv:1906.00121

  46. [46]

    , author Xiang, X

    author Xiao, X. , author Xiang, X. , author Yang, X. , author Jin, Z. , author Xu, J. , author Wang, S. , author Mao, G. , author Shao, W. , year 2026 . title Heuristic knowledge-driven spatio-temporal forecasting via multigraph . journal IEEE Transactions on Neural Networks and Learning Systems

  47. [47]

    , author Fonod, R

    author Xiong, W. , author Fonod, R. , author Alahi, A. , author Geroliminis, N. , year 2025 . title Multi-source urban traffic flow forecasting with drone and loop detector data . journal IEEE Transactions on Intelligent Transportation Systems volume 26 , pages 18637--18652 . :10.1109/TITS.2025.3605014

  48. [48]

    , author Yin, H

    author Yu, B. , author Yin, H. , author Zhu, Z. , year 2018 . title Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting , in: booktitle Proceedings of the 27th International Joint Conference on Artificial Intelligence , publisher AAAI Press . pp. pages 3634--3640

  49. [49]

    , author Li, G

    author Yuan, H. , author Li, G. , year 2021 . title A survey of traffic prediction: from spatio-temporal data to intelligent transportation . journal Data Science and Engineering volume 6 , pages 63--85

  50. [50]

    , author Fan, X

    author Zheng, C. , author Fan, X. , author Wang, C. , author Qi, J. , year 2020 . title GMAN : A graph multi-attention network for traffic prediction , in: booktitle Proceedings of the AAAI conference on artificial intelligence , pp. pages 1234--1241

  51. [51]

    , author Zhang, Z

    author Zheng, Z. , author Zhang, Z. , year 2023 . title A stochastic recurrent encoder decoder network for multistep probabilistic wind power predictions . journal IEEE Transactions on Neural Networks and Learning Systems volume 35 , pages 9565--9578

  52. [52]

    , author Gu, Z

    author Zhou, Z. , author Gu, Z. , author Liu, P. , author Yu, W. , author Liu, Z. , year 2025 . title Leveraging semi-supervised learning and meta-learning for re-identification in few-shot spatiotemporal anomaly detection . journal IEEE Transactions on Neural Networks and Learning Systems