Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

Ayushman Raghuvanshi; Mahesh Chandran; Sundeep Prabhakar Chepuri; Thummaluru Siddartha Reddy

arxiv: 2606.04672 · v2 · pith:XB7YL3XQnew · submitted 2026-06-03 · 💻 cs.LG · cs.AI

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

Ayushman Raghuvanshi , Thummaluru Siddartha Reddy , Sundeep Prabhakar Chepuri , Mahesh Chandran This is my paper

Pith reviewed 2026-06-28 06:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continuous-time dynamic graphsstate space modelslong-range temporal reasoninggraph LaplacianHiPPO operatordynamic link predictiontopology-aware memoryzero-order hold discretization

0 comments

The pith

A state-space model for continuous-time dynamic graphs captures long-range temporal and spatial patterns by projecting HiPPO memory solutions through the graph Laplacian.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a parameter-efficient state-space framework called CTDG-SSM for learning representations on continuous-time dynamic graphs. It starts from the classical HiPPO memory operator and introduces CTT-HiPPO, which projects that solution through a polynomial of the Laplacian to jointly handle temporal dynamics and graph topology. This yields memory updates that admit an equivalent state-space form, which is then discretized via zero-order hold for practical use. The resulting model reaches state-of-the-art results on dynamic link prediction, node classification, and sequence tasks, with especially large gains on datasets that demand reasoning across long time horizons and distant graph regions.

Core claim

The central claim is that the CTT-HiPPO operator, formed by projecting the classical HiPPO solution through a polynomial of the Laplacian matrix, produces topology-aware memory updates for continuous-time dynamic graphs; these updates admit an equivalent state-space formulation (CTDG-SSM) whose zero-order-hold discretization preserves the ability to model long-range temporal and structural dependencies, leading to superior benchmark performance on tasks requiring such reasoning.

What carries the argument

CTT-HiPPO, the continuous-time Topology-Aware higher order polynomial projection operator obtained by projecting the classical HiPPO solution through a polynomial of the Laplacian matrix to produce topology-aware memory updates.

If this is right

CTDG-SSM achieves state-of-the-art performance on dynamic link prediction, dynamic node classification, and sequence classification benchmarks.
It delivers large gains specifically on datasets that require long-range temporal and spatial reasoning.
The framework supplies a computationally efficient discrete implementation via zero-order hold while remaining parameter-efficient.
The state-space formulation directly inherits the long-range modeling properties of the continuous-time CTT-HiPPO operator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same projection idea could be tested on hypergraphs or simplicial complexes by replacing the graph Laplacian with the appropriate higher-order operator.
Because the discretization step is standard zero-order hold, the method inherits any known stability or approximation guarantees of that step for irregular time sampling.
Hybrid models that combine CTDG-SSM memory with message-passing layers might further improve performance on graphs with both local and global structure.
The approach suggests a general route for injecting topological awareness into other continuous-time memory models beyond HiPPO.

Load-bearing premise

Projecting the classical HiPPO solution through a polynomial of the Laplacian matrix produces topology-aware memory updates that admit an equivalent state-space formulation whose discretization preserves the claimed long-range modeling capability.

What would settle it

Running CTDG-SSM against strong local-neighborhood baselines on a dataset explicitly constructed to require long-range temporal and spatial reasoning and finding no performance advantage would falsify the claim that the Laplacian-polynomial projection enables superior long-range capture.

Figures

Figures reproduced from arXiv: 2606.04672 by Ayushman Raghuvanshi, Mahesh Chandran, Sundeep Prabhakar Chepuri, Thummaluru Siddartha Reddy.

**Figure 1.** Figure 1: Efficiency of CTDG-SSM in terms of predictive performance and number of learnable parameters. target LRT using sequence models such as Transformer or Mamba. These methods construct temporal sequences of node features and their 1-hop temporal neighbors, patch them, and process them with either Transformer or Mamba layers (Yu et al., 2023; Ding et al., 2024). Although effective for LRT, these models inhere… view at source ↗

**Figure 2.** Figure 2: (a). Architecture of the CTDG-SSM framework. Events are batched from the input event stream, and a batch-level subgraph is constructed via subgraph sampling. Raw messages are combined with static embeddings to form node-level features, which are encoded and processed by the CTDG-SSM module to update dynamic memory. The updated memory is then aggregated with static embeddings to produce the final node repre… view at source ↗

**Figure 3.** Figure 3: We generate the data using the procedure in (Gravina et al., 2024). In this experiment, we evaluate the impact of aggregating one-hop and multi-hop structural information, as well as the significance of our structural change term, by introducing three variants of our method. First, CTDG-SSM (FO) employs a learnable first-order polynomial filter of the form I+α1Lτ . Second, CTDG-SSM (SO) LRT Time LRS Nod… view at source ↗

**Figure 4.** Figure 4: Model size vs. AUC-ROC under a transductive setting with random negative sampling. 7.5. Parameter Complexity We present the comparison among the models based on number of learnable parameters. Recall that the CTDG-SSM layer introduces learnable matrices only through A¯ LB[k] , A¯ and B¯ (L[k], X[k]) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Convergence behavior of CTDG-SSM across multiple datasets where ∆LB[k] is a perturbation matrix whose entries are sampled from a normal distribution, i.e., [∆LB]ij ∼ N (0, 1) and ϵ controls the noise level. We then evaluate the proposed algorithm by replacing LB[k] with L¯B[k] under different values of ϵ, thereby varying the severity of the structural perturbation. In [PITH_FULL_IMAGE:figures/full_fig_p02… view at source ↗

**Figure 6.** Figure 6: Extended Robustness and Ablation Studies. Left: Accuracy of CTDG-SSM on the Enron dataset under noise injection of the form LB[k] + ϵ ∆L ∥∆L∥2 , demonstrating model stability even under significant perturbation. Right: AUC-ROC on the MOOC dataset across transductive and inductive settings, showing that the model is relatively insensitive to the choice of the default time interval ∆t for unseen links. E.6. … view at source ↗

read the original abstract

Continuous-time dynamic graphs (CTDGs) provide a richer framework to capture fine-grained temporal patterns in evolving relational data. Long-range information propagation is a key challenge while learning representations, wherein it is important to retain and update information over long temporal horizons. Existing approaches restrict models to capture one-hop or local temporal neighborhoods and fail to capture multi-hop or global structural patterns. To mitigate this, we derive a parameter-efficient state-space modeling framework for continuous-time dynamic graphs (CTDG-SSM) from first principles. We first introduce continuous-time Topology-Aware higher order polynomial projection operator (CTT-HiPPO), a novel memory-based reformulation of HiPPO to jointly encode temporal dynamics and graph structure. The solution from CTT-HiPPO is obtained by projecting the classical HiPPO solution through a polynomial of the Laplacian matrix, yielding topology-aware memory updates that admit an equivalent state-space formulation for CTDGs (CTDG-SSM). Then a computationally efficient discrete formulation is obtained using the zero-order hold approach for model implementation. Across benchmarks on dynamic link prediction, dynamic node classification, and sequence classification, CTDG-SSM achieves state-of-the-art performance. Notably, it achieves large performance gains on datasets that require long range temporal (LRT) and spatial reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a CTT-HiPPO operator that projects HiPPO through a Laplacian polynomial to get topology-aware memory for CTDGs, then reduces it to an SSM with ZOH discretization, and reports SOTA gains on long-range tasks.

read the letter

The core contribution is a first-principles derivation of CTT-HiPPO that folds graph structure into the HiPPO memory update via a polynomial of the Laplacian, then shows this admits an equivalent state-space form for continuous-time dynamic graphs. The authors discretize with zero-order hold and claim large improvements on dynamic link prediction, node classification, and sequence tasks that need extended temporal and spatial context.

What stands out is the attempt to keep the long-range memory properties of HiPPO while making the updates respect the graph topology. If the projection step commutes with the relevant operators and the discretization preserves the decay rates, this would be a clean way to extend SSMs beyond local neighborhoods.

The soft spot is exactly the one the stress-test flags: the abstract gives no explicit argument, eigenvalue check, or closed-form transition matrix showing that the polynomial projection leaves the HiPPO kernel intact under ZOH on irregular event times. Without that step visible, the attribution of gains to long-range modeling stays conditional. No ablations or error breakdowns are described either.

This is for researchers already working on dynamic graphs or SSMs who want a concrete alternative to message-passing or RNN-style updates. The construction is coherent on its own terms and engages the right literature, so it clears the bar for serious refereeing even if the central algebraic claim needs tightening.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes CTDG-SSM, a state-space modeling framework for continuous-time dynamic graphs derived from first principles. It introduces the CTT-HiPPO operator, obtained by projecting the classical HiPPO solution through a polynomial of the Laplacian matrix to produce topology-aware memory updates that admit an equivalent state-space formulation; this is then discretized via zero-order hold for implementation. The model is reported to achieve state-of-the-art performance on dynamic link prediction, dynamic node classification, and sequence classification benchmarks, with large gains on datasets requiring long-range temporal and spatial reasoning.

Significance. If the central algebraic construction is valid and the discretization preserves HiPPO-style long-range memory under irregular sampling, the work would supply a principled, parameter-efficient route to long-range spatio-temporal modeling on CTDGs. The first-principles framing and use of SSMs for computational efficiency are positive features; reproducible SOTA results on LRT-focused tasks would strengthen the case for adoption in domains such as social networks or traffic forecasting.

major comments (1)

[Abstract] Abstract (CTT-HiPPO derivation paragraph): the claim that the polynomial projection of the classical HiPPO solution through the Laplacian yields topology-aware updates that admit an equivalent state-space form whose ZOH discretization retains the HiPPO memory kernel (or its stability properties) on irregular event times is not supported by any commutation argument, eigenvalue analysis, or closed-form transition matrix. This step is load-bearing for attributing observed gains to long-range modeling rather than to other modeling choices.

minor comments (1)

[Abstract] Abstract: the SOTA claim is stated without reference to specific baselines, metrics, number of runs, or statistical significance; a one-sentence summary of these details would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We appreciate the recognition of the work's potential contribution to principled long-range spatio-temporal modeling on CTDGs. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract (CTT-HiPPO derivation paragraph): the claim that the polynomial projection of the classical HiPPO solution through the Laplacian yields topology-aware updates that admit an equivalent state-space form whose ZOH discretization retains the HiPPO memory kernel (or its stability properties) on irregular event times is not supported by any commutation argument, eigenvalue analysis, or closed-form transition matrix. This step is load-bearing for attributing observed gains to long-range modeling rather than to other modeling choices.

Authors: We agree that a rigorous justification of this central algebraic step is necessary to substantiate the attribution of gains to long-range modeling. In the revised manuscript we will augment the derivation (both in the main text and appendix) with: (i) an explicit commutation argument establishing that the Laplacian-polynomial projection commutes with the HiPPO operator under the chosen polynomial basis, (ii) an eigenvalue analysis confirming that the spectrum of the resulting operator preserves the stability and memory-decay properties of the original HiPPO matrix, and (iii) the closed-form transition matrix for the zero-order-hold discretization on irregular event times, showing that the long-range kernel is retained. These additions will directly address the load-bearing concern. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation framed as independent first-principles projection of HiPPO.

full rationale

The provided abstract and description present CTDG-SSM as derived from first principles via CTT-HiPPO (projection of classical HiPPO solution through polynomial of Laplacian matrix to yield topology-aware updates that admit SSM form, followed by ZOH discretization). No quoted equations, self-definitions, or descriptions show any result reducing to a fitted parameter on target data, a self-referential definition, or a load-bearing self-citation chain. Performance claims are separate empirical benchmarks on downstream tasks. The derivation chain is self-contained against external benchmarks (classical HiPPO) with no exhibited reduction by construction. This matches the most common honest non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the existence of an equivalent state-space form after Laplacian-polynomial projection and on the validity of zero-order hold discretization for the resulting continuous-time system. No explicit free parameters or invented physical entities are named in the abstract.

axioms (2)

domain assumption The classical HiPPO solution can be projected through a polynomial of the graph Laplacian while preserving an equivalent state-space representation.
Invoked when defining CTT-HiPPO in the abstract.
domain assumption Zero-order hold discretization yields a computationally efficient and accurate discrete formulation for the continuous-time model.
Stated as the final implementation step in the abstract.

invented entities (1)

CTT-HiPPO operator no independent evidence
purpose: Jointly encode temporal dynamics and graph structure via Laplacian-polynomial projection of HiPPO memory.
New operator introduced in the abstract; no independent evidence provided.

pith-pipeline@v0.9.1-grok · 5784 in / 1483 out tokens · 18640 ms · 2026-06-28T06:57:26.930969+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 8 canonical work pages · 4 internal anchors

[1]

Dygmamba: Efficiently modeling long-term temporal dependency on continuous-time dy- namic graphs with state space models.arXiv preprint arXiv:2408.04713,

Ding, Z., Li, Y ., He, Y ., Norelli, A., Wu, J., Tresp, V ., Bron- stein, M., and Ma, Y . Dygmamba: Efficiently modeling long-term temporal dependency on continuous-time dy- namic graphs with state space models.arXiv preprint arXiv:2408.04713,

work page arXiv
[2]

URLhttp://arxiv.org/abs/2312. 00752. arXiv:2312.00752 [cs]. Gu, A., Dao, T., Ermon, S., Rudra, A., and R ´e, C. Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33: 1474–1487,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

URLhttp://arxiv.org/abs/2111. 00396. arXiv:2111.00396 [cs]. Huang, S., Poursafaei, F., Danovitch, J., Fey, M., Hu, W., Rossi, E., Leskovec, J., Bronstein, M., Rabusseau, G., and Rabbany, R. Temporal graph benchmark for ma- chine learning on temporal graphs. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.),Advances in Neu...

work page internal anchor Pith review Pith/arXiv arXiv 2056
[4]

Dyg-mamba: Continuous state space modeling on dynamic graphs.arXiv preprint arXiv:2408.06966, 2024a

Li, D., Tan, S., Zhang, Y ., Jin, M., Pan, S., Okumura, M., and Jiang, R. Dyg-mamba: Continuous state space modeling on dynamic graphs.arXiv preprint arXiv:2408.06966, 2024a. Li, J., Wu, R., Jin, X., Ma, B., Chen, L., and Zheng, Z. State space models on temporal graphs: A first- principles study.Advances in Neural Information Pro- cessing Systems, 37:1270...

work page arXiv
[5]

Simplified State Space Layers for Sequence Modeling

URLhttp://arxiv.org/abs/ 2208.04933. arXiv:2208.04933 [cs]. 10 Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models Souza, A., Mesquita, D., Kaski, S., and Garg, V . Provably expressive temporal graph networks.Advances in neural information processing systems, 35:32257–32269,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Representation Learning over Dynamic Graphs

Trivedi, R., Farajtabar, M., Biswal, P., and Zha, H. Repre- sentation learning over dynamic graphs.arXiv preprint arXiv:1803.04051,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Tcl: Transformer- based dynamic graph modelling via contrastive learning

Wang, L., Chang, X., Li, S., Chu, Y ., Li, H., Zhang, W., He, X., Song, L., Zhou, J., and Yang, H. Tcl: Transformer- based dynamic graph modelling via contrastive learning. arXiv preprint arXiv:2105.07944, 2021a. Wang, Y ., Chang, Y .-Y ., Liu, Y ., Leskovec, J., and Li, P. Inductive representation learning in temporal net- works via causal anonymous walk...

work page arXiv 2002
[8]

11 Supplementary Material: Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models A. State-Space Models State-space models (SSMs) are widely used for sequence modeling due to their ability to capture long-range dependencies through latent state evolution while remaining computationally efficient com...

2022
[9]

In all the datasets, LastFM, Enron and MOOC are mainly considered for evaluating the LRT task. In particular, The LastFM dataset corresponds to data from a music streaming platform that records user listening behaviors, where users and songs are nodes and links denote listening events (Celma, 2010). The Enron dataset is an email communication dataset amon...

2010
[10]

Table 7.AUC-ROC for transductive dynamic link prediction under

were also utilized. Table 7.AUC-ROC for transductive dynamic link prediction under. RNS: Random Negative Sampling, HNS: Historical Negative Sampling, INS : Inductive Negative Sampling. NSS DatasetsJODIE DyRep TGAT TGN CAWN TCL GraphMixer DyGFormer CTAN DyGmambaCTDG-SSM RNS LastFM 70.89±1.97 71.40±2.12 71.47±0.14 76.64±4.66 85.92±0.16 71.09±1.48 73.51±0.14...

1987
[11]

(values close to 1 are better). The proposed model not only outperforms existing approaches but also exhibits only a minor performance drop compared to the transductive setting, highlighting its ability to effectively capture global structural and temporal patterns instead of learning local structural patterns. Hyperparameter Details: In Table 9, we repor...

2093
[12]

(a)Ablation study of CTDG-SSM components across three benchmark datasets

This sensitivity analysis demonstrates that the model’s predictive power is highly stable with respect to this hyperparameter, confirming that our 22 Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models Table 14.Ablation and sensitivity analysis for the proposed CTDG-SSM model. (a)Ablation study o...

work page arXiv 1988

[1] [1]

Dygmamba: Efficiently modeling long-term temporal dependency on continuous-time dy- namic graphs with state space models.arXiv preprint arXiv:2408.04713,

Ding, Z., Li, Y ., He, Y ., Norelli, A., Wu, J., Tresp, V ., Bron- stein, M., and Ma, Y . Dygmamba: Efficiently modeling long-term temporal dependency on continuous-time dy- namic graphs with state space models.arXiv preprint arXiv:2408.04713,

work page arXiv

[2] [2]

URLhttp://arxiv.org/abs/2312. 00752. arXiv:2312.00752 [cs]. Gu, A., Dao, T., Ermon, S., Rudra, A., and R ´e, C. Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33: 1474–1487,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

URLhttp://arxiv.org/abs/2111. 00396. arXiv:2111.00396 [cs]. Huang, S., Poursafaei, F., Danovitch, J., Fey, M., Hu, W., Rossi, E., Leskovec, J., Bronstein, M., Rabusseau, G., and Rabbany, R. Temporal graph benchmark for ma- chine learning on temporal graphs. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.),Advances in Neu...

work page internal anchor Pith review Pith/arXiv arXiv 2056

[4] [4]

Dyg-mamba: Continuous state space modeling on dynamic graphs.arXiv preprint arXiv:2408.06966, 2024a

Li, D., Tan, S., Zhang, Y ., Jin, M., Pan, S., Okumura, M., and Jiang, R. Dyg-mamba: Continuous state space modeling on dynamic graphs.arXiv preprint arXiv:2408.06966, 2024a. Li, J., Wu, R., Jin, X., Ma, B., Chen, L., and Zheng, Z. State space models on temporal graphs: A first- principles study.Advances in Neural Information Pro- cessing Systems, 37:1270...

work page arXiv

[5] [5]

Simplified State Space Layers for Sequence Modeling

URLhttp://arxiv.org/abs/ 2208.04933. arXiv:2208.04933 [cs]. 10 Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models Souza, A., Mesquita, D., Kaski, S., and Garg, V . Provably expressive temporal graph networks.Advances in neural information processing systems, 35:32257–32269,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Representation Learning over Dynamic Graphs

Trivedi, R., Farajtabar, M., Biswal, P., and Zha, H. Repre- sentation learning over dynamic graphs.arXiv preprint arXiv:1803.04051,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Tcl: Transformer- based dynamic graph modelling via contrastive learning

Wang, L., Chang, X., Li, S., Chu, Y ., Li, H., Zhang, W., He, X., Song, L., Zhou, J., and Yang, H. Tcl: Transformer- based dynamic graph modelling via contrastive learning. arXiv preprint arXiv:2105.07944, 2021a. Wang, Y ., Chang, Y .-Y ., Liu, Y ., Leskovec, J., and Li, P. Inductive representation learning in temporal net- works via causal anonymous walk...

work page arXiv 2002

[8] [8]

11 Supplementary Material: Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models A. State-Space Models State-space models (SSMs) are widely used for sequence modeling due to their ability to capture long-range dependencies through latent state evolution while remaining computationally efficient com...

2022

[9] [9]

In all the datasets, LastFM, Enron and MOOC are mainly considered for evaluating the LRT task. In particular, The LastFM dataset corresponds to data from a music streaming platform that records user listening behaviors, where users and songs are nodes and links denote listening events (Celma, 2010). The Enron dataset is an email communication dataset amon...

2010

[10] [10]

Table 7.AUC-ROC for transductive dynamic link prediction under

were also utilized. Table 7.AUC-ROC for transductive dynamic link prediction under. RNS: Random Negative Sampling, HNS: Historical Negative Sampling, INS : Inductive Negative Sampling. NSS DatasetsJODIE DyRep TGAT TGN CAWN TCL GraphMixer DyGFormer CTAN DyGmambaCTDG-SSM RNS LastFM 70.89±1.97 71.40±2.12 71.47±0.14 76.64±4.66 85.92±0.16 71.09±1.48 73.51±0.14...

1987

[11] [11]

(values close to 1 are better). The proposed model not only outperforms existing approaches but also exhibits only a minor performance drop compared to the transductive setting, highlighting its ability to effectively capture global structural and temporal patterns instead of learning local structural patterns. Hyperparameter Details: In Table 9, we repor...

2093

[12] [12]

(a)Ablation study of CTDG-SSM components across three benchmark datasets

This sensitivity analysis demonstrates that the model’s predictive power is highly stable with respect to this hyperparameter, confirming that our 22 Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models Table 14.Ablation and sensitivity analysis for the proposed CTDG-SSM model. (a)Ablation study o...

work page arXiv 1988