pith. sign in

arxiv: 2606.01283 · v1 · pith:KCSHRPQSnew · submitted 2026-05-31 · 💻 cs.LG

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks

Pith reviewed 2026-06-28 17:21 UTC · model grok-4.3

classification 💻 cs.LG
keywords spatiotemporal graph neural networksadaptive kernel parametersdistance-based kernelsgraph structure preservationkrigingimputationforecasting
0
0 comments X

The pith

Misspecified kernel parameters create unavoidable approximation errors in spatiotemporal GNNs that can be removed by learning adaptive parameters while preserving graph structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that fixed distance-based kernels with wrong parameters force GNNs to incur errors they cannot avoid through training alone. It introduces a structure-preserving alternative that tunes only the scale of those kernels inside the network rather than replacing the graph with a fully learned one. A sympathetic reader would care because the approach keeps the geometric prior that helps in data-sparse regimes while still gaining flexibility. Experiments across kriging, imputation, and forecasting tasks show consistent gains over both fixed kernels and generic adaptive baselines.

Core claim

Misspecified kernel parameters introduce unavoidable approximation errors in GNNs. AdaKernel learns adaptive kernel parameters inside the neural network by optimizing the scale of physical interactions instead of discarding the original distance kernel and learning a new graph from scratch.

What carries the argument

AdaKernel, a structure-preserving mechanism that optimizes the scale parameter of an existing distance kernel inside the GNN training loop.

If this is right

  • Standard GNN layers for spatiotemporal tasks can be upgraded by inserting AdaKernel without changing their architecture.
  • Performance improves most in data-sparse regimes where fully learned graphs lose the geometric signal.
  • Learned kernel parameters outperform both fixed priors and model-agnostic adaptive mechanisms on kriging, imputation, and forecasting.
  • The same adaptive-scale idea applies to any GNN that begins with a distance-derived adjacency matrix.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the supplied distance kernel misses the main spatial pattern, the structure-preserving choice may lose to fully latent alternatives.
  • The method could be tested on non-Euclidean domains where the initial kernel is only an approximation.
  • One could measure how much the learned scale deviates from the original parameter across different datasets to quantify reliance on the prior.

Load-bearing premise

The initial distance kernel already encodes the dominant geometric structure, so scaling it is sufficient.

What would settle it

A spatiotemporal dataset in which no scaling of the given distance kernel recovers the true interaction graph, where AdaKernel then underperforms a method that learns the graph structure from scratch.

Figures

Figures reproduced from arXiv: 2606.01283 by Guangyin Jin, Suwan Yin, Yuankai Wu, Yuxuan Liang, Zhongyue Zhang.

Figure 1
Figure 1. Figure 1: Impact of kernel parameter selection on GNN-based krig [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of graph structures and imputation performance. (a) The original graph structure with static kernel parameter where [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of kriging results on the METR-LA dataset with 30% missing nodes. From left to right: ground truth, fixed-kernel [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance comparison of different kernel functions. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Node degree comparison of adjacency matrices with fixed [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of kriging results on PeMS-Bay dataset with [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of kriging results on PeMS-Bay dataset with [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: IGNNK for kriging task on METR-LA. Center Node Neighbor Nodes (a) Fixed graph structure Center Node Neighbor Nodes (b) Learned graph structure 0 5 10 15 20 25 30 35 40 61.0 61.5 62.0 62.5 63.0 63.5 64.0 64.5 65.0 Speed Complete Missing Missing Value 0 5 10 15 20 25 30 35 40 Time Steps Static Kernel GroundTruth Imputation 0 5 10 15 20 25 30 35 40 AdaKernel GroundTruth Imputation (c) Kriging results [PITH_F… view at source ↗
Figure 9
Figure 9. Figure 9: KITS for kriging task on PeMS-BAY. Center Node Neighbor Nodes (a) Fixed graph structure Center Node Neighbor Nodes (b) Learned graph structure 0 10 20 30 40 66 67 68 69 70 71 Speed Complete Missing Missing Value 0 10 20 30 40 Time Steps Static Kernel GroundTruth Imputation 0 10 20 30 40 AdaKernel GroundTruth Imputation (c) Kriging results [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: IGNNK for kriging task on PeMS-BAY. Center Node Neighbor Nodes (a) Fixed graph structure Center Node Neighbor Nodes (b) Learned graph structure 0 5 10 15 20 25 30 10 20 30 40 50 60 Speed Block Missing Missing Value 0 5 10 15 20 25 30 Time Steps Static Kernel GroundTruth Imputation 0 5 10 15 20 25 30 AdaKernel GroundTruth Imputation (c) Imputation results [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: GRIN for imputation task on PeMS-BAY (block missing). [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: GRIN for imputation task on AQI36. Center Node Neighbor Nodes (a) Fixed graph structure Center Node Neighbor Nodes (b) Learned graph structure 0 5 10 15 20 20 30 40 50 60 Speed Target Target 0 5 10 15 20 Time Steps Static Kernel GroundTruth Imputation 0 5 10 15 20 AdaKernel GroundTruth Imputation (c) Forecasting results [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: DCRNN for forecasting task on METR-LA. Center Node Neighbor Nodes (a) Fixed Matern kernel ´ Center Node Neighbor Nodes (b) Learned Matern kernel ´ Center Node Neighbor Nodes (c) Fixed RQ kernel Center Node Neighbor Nodes (d) Learned RQ kernel [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Relationship between error improvement and nodes’ average distances to their neighbors for both kriging and imputation tasks. [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Node degree comparison of adjacency matrices with fixed [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
read the original abstract

Modeling spatial dependencies is central to spatiotemporal data analysis using Graph Neural Networks (GNNs). Traditional methods rely on distance-based kernels with predefined parameters, which restricts model capacity. Although generic adaptive mechanisms (e.g., Graph Attention Networks) offer flexibility, they often fail to capture the underlying geometric structure, performing worse than distance-based models in data-sparse scenarios. Addressing this, we revisit the kernel parameterization problem and theoretically prove that misspecified kernel parameters introduce unavoidable approximation errors in GNNs. To overcome this, we propose AdaKernel, a simple yet effective approach that learns adaptive kernel parameters within the neural network. Unlike methods that learn graph structures from scratch, AdaKernel adopts a structure-preserving strategy that optimizes the scale of physical interactions rather than discarding them. Extensive experiments on Kriging, Imputation, and Forecasting demonstrate that AdaKernel consistently improves various GNN architectures and outperforms model-agnostic adaptive baselines, validating that accurately learned kernel parameters are superior to both fixed priors and fully latent graph structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that misspecified parameters in distance-based kernels for spatiotemporal GNNs introduce unavoidable approximation errors (theoretically proven), and introduces AdaKernel to learn adaptive kernel scales while preserving the original graph structure. It reports consistent empirical gains over fixed-kernel GNNs and model-agnostic adaptive baselines (e.g., GAT-style) on kriging, imputation, and forecasting tasks, arguing that structure-preserving adaptation is superior to learning graphs from scratch.

Significance. If the theoretical result on approximation error is correct and the empirical superiority holds under the structure-preserving premise, the work could offer a lightweight, interpretable alternative to fully latent graph learning in settings where distance priors are reliable. The reproducible experimental protocol and direct comparison to both fixed and adaptive baselines would be strengths if the datasets and statistical tests are fully documented.

major comments (3)
  1. [Abstract, §3] Abstract and §3 (theoretical analysis): the claim of 'unavoidable approximation errors' from misspecified kernel parameters is central to motivating AdaKernel, yet the provided abstract gives no derivation details, assumptions on the GNN message-passing operator, or conditions under which the error bound is tight; without this, the necessity of the structure-preserving approach cannot be evaluated.
  2. [Experiments] Experimental section (kriging/imputation/forecasting results): the superiority over 'model-agnostic adaptive baselines' is presented as evidence that scale-only adaptation beats learning structures from scratch, but no stress-test datasets are described where the initial Euclidean/distance kernel has incorrect connectivity (e.g., non-Euclidean manifolds or misaligned sensors); this directly bears on the skeptic concern that the premise 'original kernel encodes dominant geometry' may not hold.
  3. [Experiments] Table/figure results (performance tables): reported gains lack mention of statistical significance tests, multiple random seeds, or variance across runs; if the improvements are within one standard deviation of baselines, the claim of consistent outperformance is weakened.
minor comments (2)
  1. [Method] Notation for kernel parameters (e.g., scale σ) should be defined explicitly at first use and kept consistent between theory and implementation sections.
  2. [Experiments] Dataset details (sensor counts, temporal lengths, train/val/test splits) are referenced but not fully tabulated; this hinders reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and indicate the planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (theoretical analysis): the claim of 'unavoidable approximation errors' from misspecified kernel parameters is central to motivating AdaKernel, yet the provided abstract gives no derivation details, assumptions on the GNN message-passing operator, or conditions under which the error bound is tight; without this, the necessity of the structure-preserving approach cannot be evaluated.

    Authors: We agree that greater explicitness is needed. In the revision we will expand the abstract to include a concise statement of the main assumptions on the message-passing operator and the conditions under which the error bound holds. Section 3 will be augmented with a dedicated paragraph that states all assumptions upfront and provides a high-level derivation sketch, enabling readers to assess the tightness of the bound and the motivation for structure preservation. revision: yes

  2. Referee: [Experiments] Experimental section (kriging/imputation/forecasting results): the superiority over 'model-agnostic adaptive baselines' is presented as evidence that scale-only adaptation beats learning structures from scratch, but no stress-test datasets are described where the initial Euclidean/distance kernel has incorrect connectivity (e.g., non-Euclidean manifolds or misaligned sensors); this directly bears on the skeptic concern that the premise 'original kernel encodes dominant geometry' may not hold.

    Authors: We acknowledge the value of such stress tests. The current experiments target regimes in which distance priors are known to be informative, consistent with the theoretical premise. We did not construct deliberate counter-examples with fundamentally incorrect initial connectivity. In the revision we will add an explicit limitations paragraph that discusses the scope of applicability and the conditions under which the structure-preserving assumption may fail, thereby clarifying the boundary of the claimed superiority. revision: partial

  3. Referee: [Experiments] Table/figure results (performance tables): reported gains lack mention of statistical significance tests, multiple random seeds, or variance across runs; if the improvements are within one standard deviation of baselines, the claim of consistent outperformance is weakened.

    Authors: We agree that reporting practices should be strengthened. The revised manuscript will present all results as means and standard deviations over at least five random seeds, include paired statistical significance tests against the strongest baselines, and update the tables and figures accordingly. This will allow readers to judge whether the observed gains exceed run-to-run variability. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent theoretical claim and training-time optimization

full rationale

The paper's central claims rest on a stated theoretical proof that misspecified kernel parameters cause approximation errors in GNNs, followed by a structure-preserving learning procedure whose parameters are optimized via standard backpropagation on held-out data. No equation is shown to be definitionally equivalent to its own input, no fitted quantity is relabeled as a prediction, and no load-bearing step reduces to a self-citation chain. The structure-preserving strategy is presented as an explicit design choice rather than a mathematical necessity derived from prior self-work. The derivation chain therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method implicitly assumes that a single learnable scale per kernel suffices to correct misspecification.

pith-pipeline@v0.9.1-grok · 5712 in / 1103 out tokens · 18229 ms · 2026-06-28T17:21:08.991160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Kriging convolutional networks

    [Applebyet al., 2020 ] Gabriel Appleby, Linfeng Liu, and Li-Ping Liu. Kriging convolutional networks. InProceed- ings of the AAAI Conference on Artificial Intelligence, vol- ume 34, pages 3187–3194,

  2. [2]

    Spatio-temporal data mining: A survey of problems and methods.ACM Computing Surveys, 51(4):1–41,

    [Atluriet al., 2018 ] Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. Spatio-temporal data mining: A survey of problems and methods.ACM Computing Surveys, 51(4):1–41,

  3. [3]

    Adaptive graph convolutional recurrent network for traffic forecasting

    [Baiet al., 2020 ] Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. Adaptive graph convolutional recurrent network for traffic forecasting. InAdvances in Neural In- formation Processing Systems, volume 33, pages 17804– 17815,

  4. [4]

    Relational inductive biases, deep learning, and graph networks

    [Battagliaet al., 2018 ] Peter W Battaglia, Jessica B Ham- rick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks.arXiv preprint arXiv:1806.01261,

  5. [5]

    Dynamit: a simulation-based system for traffic prediction

    [Ben-Akivaet al., 1998 ] Moshe Ben-Akiva, Michel Bier- laire, Haris Koutsopoulos, and Rabi Mishalani. Dynamit: a simulation-based system for traffic prediction. InDAC- CORD short term forecasting workshop, volume

  6. [6]

    Brits: Bidirectional recurrent im- putation for time series.Advances in neural information processing systems, 31,

    [Caoet al., 2018 ] Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, and Yitan Li. Brits: Bidirectional recurrent im- putation for time series.Advances in neural information processing systems, 31,

  7. [7]

    Rossi, Kanak Mahadik, Sungchul Kim, and Hoda Eldardiry

    [Chenet al., 2021 ] Hongjie Chen, Ryan A. Rossi, Kanak Mahadik, Sungchul Kim, and Hoda Eldardiry. Graph deep factors for forecasting with applications to cloud resource allocation. InProceedings of the 27th ACM SIGKDD Con- ference on Knowledge Discovery and Data Mining, KDD ’21, pages 106–116, New York, NY , USA,

  8. [8]

    [Ciniet al., 2022 ] Andrea Cini, Ivan Marisca, and Cesare Alippi

    Associa- tion for Computing Machinery. [Ciniet al., 2022 ] Andrea Cini, Ivan Marisca, and Cesare Alippi. Filling the Gaps: Multivariate time series impu- tation by graph neural networks. InInternational Confer- ence on Learning Representations,

  9. [9]

    Spatial prediction and ordi- nary kriging.Mathematical geology, 20:405–421,

    [Cressie, 1988] Noel Cressie. Spatial prediction and ordi- nary kriging.Mathematical geology, 20:405–421,

  10. [10]

    John Wiley & Sons,

    [Cressie, 2015] Noel Cressie.Statistics for spatial data. John Wiley & Sons,

  11. [11]

    [Daley, 1993] Roger Daley.Atmospheric data analysis. Number

  12. [12]

    Graph neural network-based anomaly detection in multivariate time series

    [Deng and Hooi, 2021] Ailin Deng and Bryan Hooi. Graph neural network-based anomaly detection in multivariate time series. InProceedings of the AAAI Conference on Ar- tificial Intelligence, volume 35, pages 4027–4035,

  13. [13]

    Learning from highly sparse spatio- temporal data

    [Denget al., 2024 ] Leyan Deng, Chenwang Wu, Defu Lian, and Enhong Chen. Learning from highly sparse spatio- temporal data. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems,

  14. [14]

    Neu- ral message passing for quantum chemistry

    [Gilmeret al., 2017 ] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neu- ral message passing for quantum chemistry. InInter- national Conference on Machine Learning, pages 1263–

  15. [15]

    Decou- pling long-and short-term patterns in spatiotemporal infer- ence.IEEE Transactions on Neural Networks and Learn- ing Systems,

    [Huet al., 2023 ] Junfeng Hu, Yuxuan Liang, Zhencheng Fan, Li Liu, Yifang Yin, and Roger Zimmermann. Decou- pling long-and short-term patterns in spatiotemporal infer- ence.IEEE Transactions on Neural Networks and Learn- ing Systems,

  16. [16]

    [Jain and Wallace, 2019] Sarthak Jain and Byron C. Wallace. Attention is not Explanation. InProceedings of the 2019 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, June

  17. [17]

    Spatio-temporal graph neural networks for predictive learning in urban computing: A survey.IEEE Transac- tions on Knowledge and Data Engineering,

    [Jinet al., 2023 ] Guangyin Jin, Yuxuan Liang, Yuchen Fang, Zezhi Shao, Jincai Huang, Junbo Zhang, and Yu Zheng. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey.IEEE Transac- tions on Knowledge and Data Engineering,

  18. [18]

    Network of tensor time series

    [Jinget al., 2021 ] Baoyu Jing, Hanghang Tong, and Yada Zhu. Network of tensor time series. InProceedings of the Web Conference 2021, pages 2425–2437,

  19. [19]

    Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting

    [Liet al., 2018 ] Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting. InInternational Conference on Learning Representations,

  20. [20]

    Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution.ACM Transactions on Knowledge Discovery from Data, 17(1):1–21,

    [Liet al., 2023 ] Fuxian Li, Jie Feng, Huan Yan, Guangyin Jin, Fan Yang, Funing Sun, Depeng Jin, and Yong Li. Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution.ACM Transactions on Knowledge Discovery from Data, 17(1):1–21,

  21. [21]

    Intro- duction to gaussian processes.NATO ASI series F com- puter and systems sciences, 168:133–166,

    [MacKay and others, 1998] David JC MacKay et al. Intro- duction to gaussian processes.NATO ASI series F com- puter and systems sciences, 168:133–166,

  22. [22]

    Global-scale temperature pat- terns and climate forcing over the past six centuries.Na- ture, 392(6678):779–787,

    [Mannet al., 1998 ] Michael E Mann, Raymond S Bradley, and Malcolm K Hughes. Global-scale temperature pat- terns and climate forcing over the past six centuries.Na- ture, 392(6678):779–787,

  23. [23]

    Learning to reconstruct missing data from spa- tiotemporal graphs with sparse observations.Advances in Neural Information Processing Systems, 35:32069–32082,

    [Mariscaet al., 2022 ] Ivan Marisca, Andrea Cini, and Ce- sare Alippi. Learning to reconstruct missing data from spa- tiotemporal graphs with sparse observations.Advances in Neural Information Processing Systems, 35:32069–32082,

  24. [24]

    Principles of geo- statistics.Economic geology, 58(8):1246–1266,

    [Matheron, 1963] Georges Matheron. Principles of geo- statistics.Economic geology, 58(8):1246–1266,

  25. [25]

    Multivariate time series forecasting with latent graph inference.arXiv preprint arXiv:2203.03423,

    [Satorraset al., 2022 ] Victor Garcia Satorras, Syama Sun- dar Rangapuram, and Tim Januschowski. Multivariate time series forecasting with latent graph inference.arXiv preprint arXiv:2203.03423,

  26. [26]

    Discrete graph structure learning for forecasting multiple time se- ries

    [Shang and Chen, 2021] Chao Shang and Jie Chen. Discrete graph structure learning for forecasting multiple time se- ries. InInternational Conference on Learning Represen- tations,

  27. [27]

    Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecast- ing

    [Shaoet al., 2022 ] Zezhi Shao, Zhao Zhang, Fei Wang, Wei Wei, and Yongjun Xu. Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecast- ing. InProceedings of the 31st ACM International Con- ference on Information & Knowledge Management, pages 4454–4458,

  28. [28]

    Exploring progress in mul- tivariate time series forecasting: Comprehensive bench- marking and heterogeneity analysis.IEEE Transactions on Knowledge and Data Engineering,

    [Shaoet al., 2024 ] Zezhi Shao, Fei Wang, Yongjun Xu, Wei Wei, Chengqing Yu, Zhao Zhang, Di Yao, Tao Sun, Guangyin Jin, Xin Cao, et al. Exploring progress in mul- tivariate time series forecasting: Comprehensive bench- marking and heterogeneity analysis.IEEE Transactions on Knowledge and Data Engineering,

  29. [29]

    Asymptotically efficient pre- diction of a random field with a misspecified covariance function.The Annals of Statistics, 16(1):55–63,

    [Stein, 1988] Michael L Stein. Asymptotically efficient pre- diction of a random field with a misspecified covariance function.The Annals of Statistics, 16(1):55–63,

  30. [30]

    A computer movie simulat- ing urban growth in the detroit region.Economic geogra- phy, 46(sup1):234–240,

    [Tobler, 1970] Waldo R Tobler. A computer movie simulat- ing urban growth in the detroit region.Economic geogra- phy, 46(sup1):234–240,

  31. [31]

    Graph attention networks

    [Veliˇckovi´cet al., 2018 ] Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li `o, and Yoshua Bengio. Graph attention networks. InInterna- tional Conference on Learning Representations,

  32. [32]

    Deepfactors for forecasting

    [Wanget al., 2019 ] Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. Deepfactors for forecasting. InIn- ternational Conference on Machine Learning, pages 6607–6617. PMLR,

  33. [33]

    Networked time series imputation via position-aware graph enhanced variational autoencoders

    [Wanget al., 2023 ] Dingsu Wang, Yuchen Yan, Ruizhong Qiu, Yada Zhu, Kaiyu Guan, Andrew Margenot, and Hanghang Tong. Networked time series imputation via position-aware graph enhanced variational autoencoders. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2256– 2268,

  34. [34]

    [Williams and Rasmussen, 2006] Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for ma- chine learning, volume

  35. [35]

    Gaussian process kernels for pattern discovery and extrapolation

    [Wilson and Adams, 2013] Andrew Wilson and Ryan Adams. Gaussian process kernels for pattern discovery and extrapolation. InInternational conference on machine learning, pages 1067–1075. PMLR,

  36. [36]

    Graph wavenet for deep spatial-temporal graph modeling

    [Wuet al., 2019 ] Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. Graph wavenet for deep spatial-temporal graph modeling. InProceedings of the 28th International Joint Conference on Artificial Intelli- gence, pages 1907–1913,

  37. [37]

    Spatial aggregation and temporal convolution networks for real-time kriging

    [Wuet al., 2021b ] Yuankai Wu, Dingyi Zhuang, Mengying Lei, Aurelie Labbe, and Lijun Sun. Spatial aggregation and temporal convolution networks for real-time kriging. arXiv preprint arXiv:2109.12144,

  38. [38]

    Traversenet: Unifying space and time in message passing for traffic forecasting.IEEE Transactions on Neural Networks and Learning Systems,

    [Wuet al., 2022 ] Zonghan Wu, Da Zheng, Shirui Pan, Quan Gan, Guodong Long, and George Karypis. Traversenet: Unifying space and time in message passing for traffic forecasting.IEEE Transactions on Neural Networks and Learning Systems,

  39. [39]

    Kits: Inductive spatio- temporal kriging with increment training strategy.arXiv preprint arXiv:2311.02565,

    [Xuet al., 2023 ] Qianxiong Xu, Cheng Long, Ziyue Li, Sijie Ruan, Rui Zhao, and Zhishuai Li. Kits: Inductive spatio- temporal kriging with increment training strategy.arXiv preprint arXiv:2311.02565,

  40. [40]

    St- mvl: Filling missing values in geo-sensory time series data

    [Yiet al., 2016 ] Xike Yi, Yu Zhang, and Linghe Kong. St- mvl: Filling missing values in geo-sensory time series data. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2704–2710,

  41. [41]

    Nodetrans: A graph transfer learning approach for traffic prediction.arXiv preprint arXiv:2207.01301,

    [Yinet al., 2022 ] Xueyan Yin, Feifan Li, Yanming Shen, Heng Qi, and Baocai Yin. Nodetrans: A graph transfer learning approach for traffic prediction.arXiv preprint arXiv:2207.01301,

  42. [42]

    Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting

    [Yuet al., 2018 ] Bing Yu, Haoteng Yin, and Zhanxing Zhu. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. InProceedings of the 27th International Joint Conference on Artificial In- telligence, pages 3634–3640,

  43. [43]

    Speckriging: Gnn-based secure cooperative spectrum sensing.IEEE Transactions on Wireless Communications, 21(11):9936– 9946,

    [Zhanget al., 2022 ] Yan Zhang, Ang Li, Jiawei Li, Dianqi Han, Tao Li, Rui Zhang, and Yanchao Zhang. Speckriging: Gnn-based secure cooperative spectrum sensing.IEEE Transactions on Wireless Communications, 21(11):9936– 9946,

  44. [44]

    Inconsistent estimation and asymptotically equal interpolations in model-based geo- statistics.Journal of the American Statistical Association, 99(465):250–261,

    [Zhang, 2004] Hao Zhang. Inconsistent estimation and asymptotically equal interpolations in model-based geo- statistics.Journal of the American Statistical Association, 99(465):250–261,

  45. [45]

    Gman: A graph multi- attention network for traffic prediction

    [Zhenget al., 2020 ] Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. Gman: A graph multi- attention network for traffic prediction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 1234–1241,

  46. [46]

    Increase: Inductive graph representation learning for spatio-temporal kriging

    [Zhenget al., 2023 ] Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, Jianzhong Qi, Chaochao Chen, and Long- biao Chen. Increase: Inductive graph representation learning for spatio-temporal kriging. InProceedings of the ACM Web Conference 2023, pages 673–683,

  47. [47]

    A comparison of spatial semi- variogram estimators and corresponding ordinary kriging predictors.Technometrics, 33(1):77–91,

    [Zimmerman and Zimmerman, 1991] Dale L Zimmerman and M Bridget Zimmerman. A comparison of spatial semi- variogram estimators and corresponding ordinary kriging predictors.Technometrics, 33(1):77–91,

  48. [48]

    These datasets were chosen to evaluate the performance of the proposed methods in different spatiotemporal tasks, such as imputation, kriging, and forecasting

    The datasets span multiple do- mains, including traffic speed, traffic flow, and air quality. These datasets were chosen to evaluate the performance of the proposed methods in different spatiotemporal tasks, such as imputation, kriging, and forecasting. E.2 Experimental Setup for Different Tasks For the imputation and forecasting tasks, we use a standard ...

  49. [49]

    Imputation Task Details.We conduct experiments on sev- eral widely used datasets: METR-LA, PEMS-BAY , AQI, and PEMS03/04/07/08

    These settings en- sure consistency and allow fair comparisons across tasks and datasets. Imputation Task Details.We conduct experiments on sev- eral widely used datasets: METR-LA, PEMS-BAY , AQI, and PEMS03/04/07/08. For METR-LA and PEMS-BAY , we ap- ply two missing data injection policies: •Block Missing:Randomly mask 5% of the data and simulate tempora...