pith. sign in

arxiv: 2605.15575 · v1 · pith:NWXZPWIYnew · submitted 2026-05-15 · 💻 cs.LG · cs.DB

Gaussian Relational Graph Transformer

Pith reviewed 2026-05-20 19:59 UTC · model grok-4.3

classification 💻 cs.LG cs.DB
keywords relational graph transformerGaussian attentionstructure-semantic samplingtemporal dependenciesrelational databasesgraph attention mechanismmessage passing decay
0
0 comments X

The pith

GelGT uses structure-semantic sampling and Gaussian attention to capture long-range dependencies in relational graphs without information decay.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Relational graph models represent databases as graphs to predict outcomes but often suffer from information loss over long distances and difficulty combining structure, semantics, and time. The paper proposes GelGT to fix this with a sampling method that keeps connected structures while dropping unrelated semantic details, followed by a Gaussian-based attention that learns a bias to track temporal changes on those subgraphs. A sympathetic reader would care because this could lead to more accurate predictions in areas like recommendation or fraud detection where relational data is common and long dependencies matter. If successful, it shows that targeted sampling plus specialized attention can replace deeper message passing that worsens decay.

Core claim

The central discovery is that a structure-semantic collaborative sampling strategy preserves structural connectivity while filtering irrelevant semantic information, and a Gaussian graph attention mechanism with a learnable Gaussian bias on the sampled subgraphs dynamically encodes temporal dependencies, leading to state-of-the-art performance with up to 13.8% improvement in predictive tasks.

What carries the argument

Structure-semantic collaborative sampling strategy paired with Gaussian graph attention using learnable bias, which selects relevant subgraphs and weights edges with a Gaussian function adjusted for time.

Load-bearing premise

The structure-semantic collaborative sampling keeps essential connections intact while the Gaussian bias successfully models temporal dependencies without causing additional information loss or introducing artifacts.

What would settle it

An ablation study where the learnable Gaussian bias is removed or replaced with a standard attention mechanism on the same subgraphs, resulting in no significant performance difference or degradation, would indicate that the Gaussian component is not the key to encoding temporal dependencies.

Figures

Figures reproduced from arXiv: 2605.15575 by Jin Li, Xike Xie, Xugang Wang, Zezhong Ding.

Figure 1
Figure 1. Figure 1: Comparison of existing methods and our proposed GelGT. Structurally, GelGT alleviates structural fragmentation by preserving node connectivity during sampling. Semantically, it mitigates the interference of semantically noisy nodes during sampling. Temporally, it employs Gaussian Temporal Bias to explicitly distinguish between temporally relevant and noisy nodes. solely based on the similarity of entangled… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the GelGT Framework. The process consists of six main steps: ① Graph Construction converts the relational database into a relational graph. ② Structural Integrity Sampling. This stage samples subgraphs via BFS while enforcing strict timestamp constraints to mask future nodes for temporal validity. ③ Semantic Refinement. It filters noise by retaining only neighbors with high semantic similarity,… view at source ↗
Figure 3
Figure 3. Figure 3: Ablation and efficiency evaluation. To address (RQ5), we conduct an ablation study by removing the GNN branch. As shown in the table 5, performance consistently drops across all datasets. This validates the necessity of the GNN branch in GelGT. The attention branch, even with hop-distance embeddings and GNN-based positional encodings, operates on a fully connected sampled subgraph and therefore only captur… view at source ↗
Figure 4
Figure 4. Figure 4: GelGT’s Performance across five tasks under different sampling sizes and sampling hops. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Relational graph learning models relational databases as graphs and has demonstrated superior performance on a wide range of relational predictive tasks. However, existing methods struggle to capture long-range dependencies due to information decay in their message-passing mechanisms, and recent relational graph transformers remain limited in jointly modeling structural, semantic, and temporal information. In this paper, we propose GelGT, a Gaussian relational graph transformer that explicitly addresses these challenges. GelGT introduces a structure-semantic collaborative sampling strategy to preserve structural connectivity while filtering irrelevant semantic information, and incorporates a Gaussian graph attention mechanism with a learnable Gaussian bias on the sampled subgraphs to dynamically encode temporal dependencies. Extensive experiments on various real-world datasets demonstrate that GelGT achieves state-of-the-art downstream task performance, with up to a 13.8% improvement in predictive performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes GelGT, a Gaussian relational graph transformer for modeling relational databases as graphs. It introduces a structure-semantic collaborative sampling strategy that preserves structural connectivity while filtering irrelevant semantic information, and a Gaussian graph attention mechanism incorporating a learnable Gaussian bias applied to the sampled subgraphs to encode temporal dependencies. The central claim is that these components jointly address long-range dependency issues and limitations in prior relational graph transformers, yielding state-of-the-art results with up to 13.8% improvement in predictive performance across real-world datasets.

Significance. If the mechanisms hold under rigorous validation, the work could advance relational graph learning by providing an explicit way to integrate structural, semantic, and temporal modeling without relying on standard message-passing decay. The connectivity-preserving sampling and the derivation of the Gaussian bias to modulate attention scores represent internally consistent modeling choices that align with the stated goals of avoiding information decay. Credit is due for the reproducible experimental setup implied by the extensive real-world dataset evaluations and the parameter-efficient temporal encoding via the learnable bias.

major comments (2)
  1. [§5] §5 (Experimental results): The abstract and results claim up to 13.8% predictive gains, yet the manuscript supplies no error bars, statistical significance tests, or explicit exclusion criteria for baselines and datasets. This undermines the load-bearing claim of consistent SOTA performance, as post-hoc sampling choices could affect the reported margins.
  2. [§4.1] §4.1 (Gaussian graph attention): The learnable Gaussian bias is presented as dynamically encoding temporal dependencies on sampled subgraphs, but the derivation does not explicitly demonstrate how it avoids reduction to a data-fitted quantity (as flagged in the circularity assessment); a concrete expansion of the attention score modulation formula would be required to confirm independence from the training distribution.
minor comments (2)
  1. [Abstract] The abstract would benefit from a one-sentence summary of the key equations for the collaborative sampling and Gaussian bias to improve accessibility.
  2. [§3.2] Notation for the sampled subgraph construction in §3.2 should include an explicit statement that connectivity is preserved by construction (e.g., via edge retention rules).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We appreciate the recognition of the modeling contributions and the call for stronger experimental validation. We address each major comment below, agreeing where the manuscript requires clarification or augmentation, and outline the specific revisions.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental results): The abstract and results claim up to 13.8% predictive gains, yet the manuscript supplies no error bars, statistical significance tests, or explicit exclusion criteria for baselines and datasets. This undermines the load-bearing claim of consistent SOTA performance, as post-hoc sampling choices could affect the reported margins.

    Authors: We agree that the current presentation would benefit from additional statistical rigor. In the revised manuscript we will report mean and standard deviation over five independent random seeds for all methods and datasets. We will also add paired t-tests (with p-values) against the strongest baseline on each task to substantiate the claimed gains. Regarding selection criteria, the baselines comprise all recent relational graph transformers and message-passing models that report results on the same public datasets; we will insert an explicit paragraph in Section 5 listing the inclusion rules (publication date, task coverage, and public availability) to remove any ambiguity about post-hoc choices. revision: yes

  2. Referee: [§4.1] §4.1 (Gaussian graph attention): The learnable Gaussian bias is presented as dynamically encoding temporal dependencies on sampled subgraphs, but the derivation does not explicitly demonstrate how it avoids reduction to a data-fitted quantity (as flagged in the circularity assessment); a concrete expansion of the attention score modulation formula would be required to confirm independence from the training distribution.

    Authors: We thank the referee for highlighting the need for a clearer derivation. The attention logit is computed as (QK^T / sqrt(d_k)) + B, where the Gaussian bias B_{ij} = - (t_i - t_j - mu)^2 / (2 sigma^2) and mu, sigma are learnable scalars per attention head. This functional form is fixed and depends only on the relative temporal coordinates of nodes in the sampled subgraph; it is independent of the downstream label or prediction distribution. The parameters are optimized jointly with the rest of the model, yet the bias remains a parametric kernel rather than an arbitrary data-dependent term. We will insert the expanded formula together with a short paragraph discussing its non-circular character in the revised Section 4.1. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces GelGT via two explicit modeling components: a structure-semantic collaborative sampling procedure that preserves connectivity while filtering semantics, and a Gaussian graph attention layer whose learnable bias is added to modulate attention scores on the sampled subgraphs. Neither component is defined in terms of the other or of the final performance metric; the bias term is introduced as an independent architectural choice rather than being fitted to the target prediction and then re-used as a 'prediction.' No self-citations are invoked to justify uniqueness or to smuggle in an ansatz, and the reported gains are obtained from downstream experiments rather than from any algebraic identity that collapses the claimed temporal encoding back to the input data. The derivation therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The model introduces one learnable parameter (Gaussian bias) whose value is fitted during training to capture temporal effects; no additional axioms or invented entities are stated in the abstract.

free parameters (1)
  • learnable Gaussian bias
    Parameter in the Gaussian graph attention mechanism, adjusted during training to encode temporal dependencies on sampled subgraphs.

pith-pipeline@v0.9.0 · 5659 in / 1042 out tokens · 24918 ms · 2026-05-20T19:59:00.112450+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

  1. [1]

    Aditya, Gaurav Bhalotia, Soumen Chakrabarti, Arvind Hulgeri, Charuta Nakhe, Parag, and S

    B. Aditya, Gaurav Bhalotia, Soumen Chakrabarti, Arvind Hulgeri, Charuta Nakhe, Parag, and S. Sudarshan. BANKS: browsing and keyword searching in relational databases. InVLDB, pages 1083–1086, 2002

  2. [2]

    Storage and querying of e-commerce data

    Rakesh Agrawal, Amit Somani, and Yirong Xu. Storage and querying of e-commerce data. In VLDB, pages 149–158, 2001

  3. [3]

    Halpin and Tony Morgan.Information modeling and relational databases (2

    Terry A. Halpin and Tony Morgan.Information modeling and relational databases (2. ed.). Morgan Kaufmann, 2008

  4. [4]

    Xgboost: A scalable tree boosting system

    Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InSIGKDD, pages 785–794, 2016

  5. [5]

    Tabular data: Deep learning is not all you need.Inf

    Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need.Inf. Fusion, 81:84–90, 2022

  6. [6]

    Why do tree-based models still outperform deep learning on typical tabular data? InNeurIPS, 2022

    Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? InNeurIPS, 2022

  7. [7]

    Trompt: Towards a better deep neural network for tabular data

    Kuan-Yu Chen, Ping-Han Chiang, Hsin-Rung Chou, Ting-Wei Chen, and Tien-Hao Chang. Trompt: Towards a better deep neural network for tabular data. InICML, 2023

  8. [8]

    Transformers with stochastic competition for tabular data modelling

    Andreas V oskou, Charalambos Christoforou, and Sotirios Chatzis. Transformers with stochastic competition for tabular data modelling. InICML Workshop, 2024

  9. [9]

    Kanatsoulis, Shenyang Huang, and Jure Leskovec

    Vijay Prakash Dwivedi, Charilaos I. Kanatsoulis, Shenyang Huang, and Jure Leskovec. Rela- tional deep learning: Challenges, foundations and next-generation architectures. InKDD, pages 5999–6009, 2025

  10. [10]

    Deep feature synthesis: Towards automating data science endeavors

    James Max Kanter and Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. InDSAA, pages 1–10, 2015

  11. [11]

    Automated data science for relational data

    Hoang Thanh Lam, Beat Buesser, Hong Min, Tran Ngoc Minh, Martin Wistuba, Udayan Khu- rana, Gregory Bramble, Theodoros Salonidis, Dakuo Wang, and Horst Samulowitz. Automated data science for relational data. InICDE, pages 2689–2692, 2021

  12. [12]

    Relbench: A benchmark for deep learning on relational databases

    Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan Eric Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec. Relbench: A benchmark for deep learning on relational databases. InNeurIPS, 2024

  13. [13]

    Kanatsoulis, and Jure Leskovec

    Tianlang Chen, Charilaos I. Kanatsoulis, and Jure Leskovec. Relgnn: Composite message passing for relational deep learning. InICML, 2025

  14. [14]

    Position: Relational deep learning - graph representation learning on relational databases

    Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robin- son, Rex Ying, Jiaxuan You, and Jure Leskovec. Position: Relational deep learning - graph representation learning on relational databases. InICML, 2024

  15. [15]

    Supervised learning on relational databases with graph neural networks

    Milan Cvitkovic. Supervised learning on relational databases with graph neural networks. CoRR, abs/2002.02046, 2020

  16. [16]

    Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec

    Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico Lopez, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec. Relational graph transformer. InICLR, 2026

  17. [17]

    Wright, Azalia Mirhoseini, Joseph E

    Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, and Ion Stoica. Representing long-range context for graph neural networks with global attention. In NeurIPS, 2021

  18. [18]

    Wipf, and Junchi Yan

    Qitian Wu, Wentao Zhao, Zenan Li, David P. Wipf, and Junchi Yan. Nodeformer: A scalable graph structure learning transformer for node classification. InNeurIPS, 2022

  19. [19]

    Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec

    Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos I. Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec. Relational transformer: Toward zero-shot foundation models for relational data. In ICLR, 2026. 10

  20. [20]

    Inside core-kg: Evaluating structured prompting and coreference resolution for knowledge graphs, 2025

    Dipak Meher and Carlotta Domeniconi. Inside core-kg: Evaluating structured prompting and coreference resolution for knowledge graphs, 2025

  21. [21]

    Topical web crawlers: Evaluating adaptive algorithms.ACM Trans

    Filippo Menczer, Gautam Pant, and Padmini Srinivasan. Topical web crawlers: Evaluating adaptive algorithms.ACM Trans. Internet Techn., 2004

  22. [22]

    Gaussian transformer: A lightweight approach for natural language inference

    Maosheng Guo, Yu Zhang, and Ting Liu. Gaussian transformer: A lightweight approach for natural language inference. InAAAI, 2019

  23. [23]

    E. F. Codd. A relational model of data for large shared data banks.Commun. ACM, 13(6):377– 387, 1970

  24. [24]

    E. F. Codd. Extending the database relational model to capture more meaning.ACM Trans. Database Syst., 4(4):397–434, 1979

  25. [25]

    Duetgraph: Coarse-to-fine knowledge graph reasoning with dual-pathway global-local fusion

    Jin Li, Zezhong Ding, and Xike Xie. Duetgraph: Coarse-to-fine knowledge graph reasoning with dual-pathway global-local fusion. InNeurIPS, 2025

  26. [26]

    Abowd, and James M

    Karthir Prabhakar, Sang Min Oh, Ping Wang, Gregory D. Abowd, and James M. Rehg. Temporal causality for the analysis of visual events. InCVPR, 2010

  27. [27]

    A new status index derived from sociometric analysis.Psychometrika, 18(1):39–43, 1953

    Leo Katz. A new status index derived from sociometric analysis.Psychometrika, 18(1):39–43, 1953

  28. [28]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017

  29. [29]

    Integrating temporal and structural context in graph transformers for relational deep learning.arXiv preprint arXiv:2511.04557, 2025

    Divyansha Lachi, Mahmoud Mohammadi, Joe Meyer, Vinam Arora, Tom Palczewski, and Eva L Dyer. Integrating temporal and structural context in graph transformers for relational deep learning.arXiv preprint arXiv:2511.04557, 2025

  30. [30]

    Griffin: Towards a graph-centric relational database foundation model

    Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. Griffin: Towards a graph-centric relational database foundation model. InICML, 2025

  31. [31]

    Heterogeneous graph transformer

    Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In WWW, pages 2704–2710, 2020

  32. [32]

    Lightgbm: A highly efficient gradient boosting decision tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. InNeurIPS, 2017

  33. [33]

    A method of comparing the areas under receiver operating characteristic curves derived from the same cases.Radiology, 148(3):839–843, 1983

    J A Hanley and B J Mcneil. A method of comparing the areas under receiver operating characteristic curves derived from the same cases.Radiology, 148(3):839–843, 1983

  34. [34]

    Tabnet: Attentive interpretable tabular learning

    Sercan Ö Arik and Tomas Pfister. Tabnet: Attentive interpretable tabular learning. InAAAI, 2021

  35. [35]

    Net-dnf: Effective deep modeling of tabular data

    Liran Katzir, Gal Elidan, and Ran El-Yaniv. Net-dnf: Effective deep modeling of tabular data. InICLR, 2020

  36. [36]

    Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. Hetero- geneous graph attention network. InWWW, 2019

  37. [37]

    On the limiting behavior of parameter-dependent network centrality measures.SIAM Journal on Matrix Analysis and Applications, 2013

    Michele Benzi and Christine Klymko. On the limiting behavior of parameter-dependent network centrality measures.SIAM Journal on Matrix Analysis and Applications, 2013

  38. [38]

    Pairnorm: Tackling oversmoothing in gnns

    Lingxiao Zhao and Leman Akoglu. Pairnorm: Tackling oversmoothing in gnns. InICLR, 2020

  39. [39]

    What uncertainties do we need in bayesian deep learning for computer vision? InNeurIPS, 2017

    Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? InNeurIPS, 2017

  40. [40]

    Eine verallgemeinerung der theorie der fourier- reihen.Acta mathematica, 45:29, 1925

    DER FAST PERIODISCHEN ZUR THEORIE. Eine verallgemeinerung der theorie der fourier- reihen.Acta mathematica, 45:29, 1925

  41. [41]

    Identity-aware graph neural networks

    Jiaxuan You, Jonathan Michael Gomes Selman, Rex Ying, and Jure Leskovec. Identity-aware graph neural networks. InAAAI, 2021. 11

  42. [42]

    Contextgnn: Beyond two-tower recommendation systems

    Yiwen Yuan, Zecheng Zhang, Xinwei He, Akihiro Nitta, Weihua Hu, Manan Shah, Blaz Stojanovic, Shenyang Huang, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. Contextgnn: Beyond two-tower recommendation systems. InICLR, 2025

  43. [43]

    Hamilton, Zhitao Ying, and Jure Leskovec

    William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. InNeurIPS, 2017

  44. [44]

    Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.CoRR, 2016

  45. [45]

    Gaussian Error Linear Units (GELUs)

    D Hendrycks. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

  46. [46]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016

  47. [47]

    Simple and deep graph convolutional networks

    Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph convolutional networks. InICML, 2020

  48. [48]

    Gorinova, Michael M

    Ben Chamberlain, James Rowbottom, Maria I. Gorinova, Michael M. Bronstein, Stefan Webb, and Emanuele Rossi. GRAND: graph neural diffusion. InICML, 2021

  49. [49]

    Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q

    Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Wein- berger. Simplifying graph convolutional networks. InICML, 2019

  50. [50]

    Deeper insights into graph convolutional networks for semi-supervised learning

    Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. InAAAI, 2018

  51. [51]

    Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-pe...

  52. [52]

    Relbench v2: A large-scale benchmark and repository for relational data, 2026

    Justin Gu, Rishabh Ranjan, Charilaos Kanatsoulis, Haiming Tang, Martin Jurkovic, Valter Hudovernik, Mark Znidar, Pranshu Chaturvedi, Parth Shroff, Fengyu Li, and Jure Leskovec. Relbench v2: A large-scale benchmark and repository for relational data, 2026

  53. [53]

    Large language models are good relational learners

    Fang Wu, Vijay Prakash Dwivedi, and Jure Leskovec. Large language models are good relational learners. InACL, pages 7835–7854, 2025

  54. [54]

    Play like a vertex: A stackelberg game approach for streaming graph partitioning

    Zezhong Ding, Yongan Xiang, Shangyou Wang, Xike Xie, and S Kevin Zhou. Play like a vertex: A stackelberg game approach for streaming graph partitioning. InProc. ACM Manag. Data, 2024

  55. [55]

    Lightgcn: Simplifying and powering graph convolution network for recommendation

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. InSIGIR, 2020

  56. [56]

    Beyond homophily in graph neural networks: Current limitations and effective designs.NeurIPS, 2020

    Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs.NeurIPS, 2020

  57. [57]

    Train short, test long: Attention with linear biases enables input length extrapolation

    Ofir Press, Noah Smith, and Mike Lewis. Train short, test long: Attention with linear biases enables input length extrapolation. InICLR, 2022

  58. [58]

    Time interval aware self-attention for sequential recommendation

    Jiacheng Li, Yujie Wang, and Julian McAuley. Time interval aware self-attention for sequential recommendation. InWSDM, 2020. 12 Appendix Overview In the Appendix, we provide additional details organized as follows:

  59. [59]

    Appendix A: Proofs of Theorems

  60. [60]

    Appendix B: Experimental Details

  61. [61]

    Appendix C: Additional Baselines and Datasets

  62. [62]

    Appendix D: Additional Efficiency Experiments

  63. [63]

    Appendix E: Detailed Description of Encoder Modules

  64. [64]

    Appendix F: Additional Analysis and Discussion

  65. [65]

    Appendix G: Reproducibility and Code Availability Statement

  66. [66]

    periodic ghosts,

    Appendix H: Limitations and Broader Impacts. A Proofs of Theorems. A.1 Upper Bound of Relative Structural Loss Proof.To characterize the multi-hop topological structure of nodes in the graph, we employ Katz centrality [27] as our analytical tool. As a classic walk-based topological metric, Katz centrality systematically captures the structural reach of a ...