arxiv: 2605.12358 · v1 · submitted 2026-05-12 · 💻 cs.LG

Recognition: no theorem link

From Message-Passing to Linearized Graph Sequence Models

Jo\"el Mathys , Basil Rohner , Saku Peltonen , Roger Wattenhofer

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:58 UTC · model grok-4.3

classification 💻 cs.LG

keywords graph neural networksmessage passingsequence modelinglong-range dependencieslinearized modelsinductive biasarchitectural separationgraph learning

0 comments

The pith

Linearized Graph Sequence Models recast message-passing by separating computational processing depth from information propagation depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Linearized Graph Sequence Models as a way to view graph message-passing computations through sequence modeling. It systematically decouples the depth used for computation from the depth of information spread across graph nodes. This move turns core decisions about graph architecture into choices about how to model sequences. The authors then identify and validate which sequence properties best preserve the structural inductive bias of graphs while enabling better handling of distant node interactions. The result is a route for importing sequence modeling advances directly into graph learning without redesigning the message-passing backbone.

Core claim

By recasting message-passing graph computation from the perspective of sequence modeling, the framework separates computational processing depth from information propagation depth. This separation allows core graph architectural decisions to be treated as sequence modeling choices. Analysis of sequence properties, both empirical and theoretical, identifies those that effectively learn and preserve the graph inductive bias, with validation showing improved performance on long-range information tasks in graphs.

What carries the argument

The separation of computational processing depth from information propagation depth, which recasts central graph architectural questions as input modeling choices within sequence models.

If this is right

Improved accuracy on graph tasks that require information to travel across many hops.
A direct method to import modern sequence-model advances such as efficient attention or state-space layers into graph architectures.
Graph design choices such as layer ordering and neighborhood aggregation become explicit decisions about sequence input formatting.
Preservation of the original graph inductive bias while still allowing deeper computational processing without proportional growth in propagation steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The depth separation could be applied to other structured domains such as point clouds or meshes by treating their connectivity as a sequence.
Testing whether specific sequence properties like linearity or recurrence directly translate to measurable gains in graph expressivity would confirm the mapping.
The framework suggests that future work could derive parameter-free relations between sequence length and required propagation depth for given graph diameters.

Load-bearing premise

That sequence modeling properties can be identified and applied to effectively learn and preserve the graph inductive bias without introducing new limitations on information flow or expressivity.

What would settle it

A sequence model built on the identified properties shows no gain in long-range task accuracy or fails to retain graph structural bias compared with standard message-passing networks on the same benchmarks.

Figures

Figures reproduced from arXiv: 2605.12358 by Basil Rohner, Jo\"el Mathys, Roger Wattenhofer, Saku Peltonen.

**Figure 1.** Figure 1: LGSM decouples information propagation (horizontal) from non-linear processing (vertical). Unlike MPNNs that advance both dimensions simultaneously, LGSM enables information to flow across the graph through linearized computation before applying non-linear transformations. This shifts graph architectural focus to the impact of the input sequences. nentially growing receptive field, which ultimately leads … view at source ↗

**Figure 2.** Figure 2: Visualization of computation in graph neural networks with respect to information depth and processing depth. Standard MPNNs couple both dimensions, propagating and updating information through non-linear transformations at the same time, advancing diagonally. Linear MPNNs propagate information without any non-linearities, but have limited transformation capabilities. Our method, stacks the linearized rows… view at source ↗

**Figure 3.** Figure 3: Overview of the LGSM architecture. The Graph Sequence Encoder converts graph input into a sequence. Then, each block applies an SSM for each node along the sequence dimension with additional non-linear transformations and graph mixing. The number of stacked blocks controls the processing depth while the length of the sequence controls information depth. respectively, and that A is normal with spectral rad… view at source ↗

**Figure 4.** Figure 4: Comparison of sequence extraction mechanisms on the ECHO eccentricity task with graphs of diameter up to 40. We evaluate non-backtracking (NBT) and adjacency powers, both original and normalized. Each datapoint represent the mean of three trained models of a specific sequence length using four LGSM blocks. Therefore the number of trainable parameters is the same for all sequence lengths. We observe that t… view at source ↗

**Figure 5.** Figure 5: Reporting validation logMSE (lower is better) on the LRIM-16-hard dataset. We ablate the impact of scaling either information depth or processing depth with LGSM. We validate that both increasing the sequence length and the model depth has a positive effect. This underscores the ability of the LGSM architecture to incorporate both axis in a principled and effective manner. complete results on the LRIM Benc… view at source ↗

**Figure 6.** Figure 6: Ablation of the impact of scaling either information depth or processing depth with LGSM evaluated on the LRIM Graph Benchmark. We see that increasing either dimension has a positive effect, while solely increasing processing depth without incorporating additional information yields diminishing returns [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Measurements for varying number of layers on ER graphs of size 256. LGSM memory grows roughly linearly with L. Memory use is clearly driven by sequence length rather than number of blocks. The timing of LGSM stays almost constant over a range of small to medium graphs. For large graphs, the forward pass time of GPS blows up the most due to global attention, while LGSM stays between GCN and GPS. 24 [PITH_F… view at source ↗

**Figure 8.** Figure 8: Measurements for fixed number of layers (16) across multiple graph sizes. The timing of LGSM stays almost constant over a range of small to medium graphs. For large graphs, the forward pass time of GPS blows up the most due to global attention, while LGSM stays between GCN and GPS. LGSM memory use increases steadily with graph size, scaling linearly and better than GPS for large graphs. 25 [PITH_FULL_IMAG… view at source ↗

read the original abstract

Message-passing based approaches form the default backbone of most learning architectures on graph-structured data. However, the rapid progress of modern deep learning architectures in other domains, particularly sequence modeling, raises the question of how graph learning can benefit from these advances. We introduce Linearized Graph Sequence Models, a framework that recasts message-passing graph computation from the perspective of sequence modeling to simplify architectural choices. Our approach systematically separates the computational processing depth from the information propagation depth, allowing core graph architectural decisions to be treated as sequence modeling choices. Specifically, we analyze, both empirically and theoretically, what sequence properties make methods effective for learning and preserving the graph inductive bias. In particular, we validate our findings, demonstrating improved performance on long-range information tasks in graphs. Our findings provide a principled way to integrate modern sequence modeling advances into message-passing based graph learning. Beyond this, our work demonstrates how the separation of processing and information depth can recast central architectural questions as input modeling choices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts message-passing as sequence modeling through a clean depth separation, with some gains on long-range graph tasks, but the linearization step risks dropping graph structure unless the ordering is proven to preserve neighborhoods.

read the letter

This paper's core idea is to treat graph message-passing as a sequence modeling problem by separating computational processing depth from information propagation depth. That move lets them reframe architectural choices on graphs as choices about the input sequence instead of the layers themselves. They call the result Linearized Graph Sequence Models and then check which sequence properties keep the graph inductive bias intact, both in theory and on experiments with long-range tasks. The empirical side shows clear improvements where standard GNNs usually fall short, which is a useful data point for anyone working on graphs that need distant information to flow reliably. The framing itself is straightforward and gives a principled route to borrow advances from sequence models without starting from scratch on every graph architecture question. The soft spot sits in the linearization step. To make the depth separation actually work, the conversion from graph to sequence has to embed the full adjacency structure so that connected nodes stay appropriately close or masked. If the ordering is a simple traversal or random walk, nodes that are neighbors in the original graph can end up arbitrarily far in the sequence, which shifts the burden onto the sequence model's long-range mechanism rather than the topology. The theoretical analysis would need to show that the claimed sequence properties hold under reordering or specify a linearization that is invariant to it; without that, expressivity becomes dependent on the chosen ordering rather than being a property of the graph. This is aimed at graph ML people who want to import sequence modeling tools for long-range problems. A reader focused on that gap would find the empirical results and the separation concept worth discussing. It deserves peer review because the central framing is coherent and the experiments target a genuine limitation, even though the linearization details will need tightening in revision.

Referee Report

1 major / 2 minor

Summary. The paper introduces Linearized Graph Sequence Models, a framework that recasts message-passing graph computation from the perspective of sequence modeling. It systematically separates computational processing depth from information propagation depth, allowing core graph architectural decisions to be treated as sequence modeling choices. The authors analyze (empirically and theoretically) which sequence properties enable effective learning and preservation of graph inductive bias, and report improved performance on long-range information tasks in graphs.

Significance. If the separation of depths is shown to preserve graph structure without introducing ordering-dependent limitations on expressivity, the work could provide a principled bridge between message-passing GNNs and modern sequence models, enabling more flexible integration of sequence-modeling advances into graph learning and better handling of long-range dependencies.

major comments (1)

[§3] §3 (linearization and depth separation): the central claim that graph architectural decisions reduce to sequence-modeling choices requires an explicit argument or theorem showing that the chosen linearization (traversal or ordering) embeds the full adjacency structure invariantly; without this, distant nodes in the original graph can become arbitrarily distant in the sequence, forcing reliance on the sequence model's long-range mechanism rather than the original topology and undermining the claimed separation.

minor comments (2)

The abstract and experimental sections should include concrete metrics, baselines, and error bars for the long-range task results to allow assessment of the claimed improvements.
Notation for propagation depth versus computational depth should be introduced with a clear equation or diagram early in the method section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for identifying a point that merits clarification in §3. We address the major comment below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: [§3] §3 (linearization and depth separation): the central claim that graph architectural decisions reduce to sequence-modeling choices requires an explicit argument or theorem showing that the chosen linearization (traversal or ordering) embeds the full adjacency structure invariantly; without this, distant nodes in the original graph can become arbitrarily distant in the sequence, forcing reliance on the sequence model's long-range mechanism rather than the original topology and undermining the claimed separation.

Authors: We agree that an explicit formal argument would strengthen the central claim. In the Linearized Graph Sequence Model framework, the linearization step produces a sequence whose successive positions correspond to message-passing steps along the graph edges; the propagation depth is exactly the length of this sequence, while the processing depth is the number of layers in the sequence model. Consequently, the adjacency structure is encoded directly in which positions are allowed to exchange information at each propagation step, independent of the Euclidean distances that arise from any particular traversal order. The sequence model then only needs to realize the chosen computational depth on this already-structured input. To make the invariance explicit, we will add a short theorem in the revised §3 stating that any complete traversal linearization preserves the original adjacency relation under the message-passing equivalence: for any pair of nodes connected by a path of length k, there exists a corresponding segment of length k in the sequence whose information flow is governed solely by the graph topology, not by the ordering chosen for the sequence. This shows that reliance on the sequence model’s long-range capabilities is not required; the topology is already injected via the linearization. We will also include a brief discussion of how different traversal heuristics affect only the computational efficiency of the linearization, not the preserved inductive bias. revision: yes

Circularity Check

0 steps flagged

Framework recasting introduces no circular derivations

full rationale

The paper introduces a conceptual framework by recasting message-passing as sequence modeling and separating computational processing depth from information propagation depth, then analyzes sequence properties empirically and theoretically for preserving graph inductive bias. No equations, derivations, or load-bearing steps are present in the abstract that reduce by construction to fitted inputs, self-definitions, or self-citation chains. The central claims rest on the separation as an organizing perspective plus validation, remaining self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the high-level framework name itself.

pith-pipeline@v0.9.0 · 5473 in / 1043 out tokens · 52155 ms · 2026-05-13T05:58:28.686007+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Arnaiz-Rodriguez, A

URL https://openreview.net/forum? id=i80OPhOCVH2. Arnaiz-Rodriguez, A. and Errica, F. Oversmoothing, Over- squashing, Heterophily, Long-Range, and more: Demys- tifying Common Beliefs in Graph Machine Learning, June 2025. URL http://arxiv.org/abs/2505. 15547. arXiv:2505.15547 [cs]. Arnaiz-Rodrıiguez, A., Begga, A., Escolano, F., and Oliver, N. M. DiffWire:...

work page arXiv 2025
[2]

Graph Mamba: Towards learning on graphs with state space models,

URL https://proceedings.mlr.press/ v198/arnaiz-rodri-guez22a.html. ISSN: 2640-3498. Arroyo, A., Gravina, A., Gutteridge, B., Barbero, F., Gallicchio, C., Dong, X., Bronstein, M. M., and Van- dergheynst, P. On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning. October 2025. URL https: //openreview.net/fo...

work page doi:10.1145/3637528.3672044 2025
[3]

Cohen, S

URL https://proceedings.mlr.press/ v119/chen20v.html. Cohen, S. and Agmon, N. Convexified Graph Neural Net- works for Distributed Control in Robotic Swarms. vol- ume 3, pp. 2307–2313, August 2021. doi: 10.24963/ ijcai.2021/318. URL https://www.ijcai.org/ proceedings/2021/318. ISSN: 1045-0823. Dao, T. and Gu, A. Transformers are SSMs: general- ized models ...

work page 2021
[4]

Dwivedi, V

URL https://proceedings.mlr.press/ v235/ding24d.html. Dwivedi, V . P., Ramp´avsek, L., Galkin, M., Parviz, A., Wolf, G., Luu, A. T., and Beaini, D. Long range graph benchmark. InProceedings of the 36th International Con- ference on Neural Information Processing Systems, NIPS ’22, pp. 22326–22340, Red Hook, NY , USA, November

work page
[5]

ISBN 978-1-7138-7108-8

Curran Associates Inc. ISBN 978-1-7138-7108-8. Eliasof, M., Gravina, A., Ceni, A., Gallicchio, C., Bac- ciu, D., and Sch ¨onlieb, C.-B. Graph Adaptive Au- toregressive Moving Average Models. InProceed- ings of the 42nd International Conference on Ma- chine Learning, pp. 15232–15265. PMLR, October

work page
[6]

node2vec: Scalable Feature Learning for Networks

URL https://proceedings.mlr.press/ v267/eliasof25a.html. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for Quantum chemistry. InProceedings of the 34th International Con- ference on Machine Learning - Volume 70, ICML’17, pp. 1263–1272, Sydney, NSW, Australia, August 2017. JMLR.org. URL https://dl.acm.org/...

work page doi:10.1145/2939672.2939754 2017
[7]

Gu, A., Dao, T., Ermon, S., Rudra, A., and R ´e, C

URL https://openreview.net/forum? id=tEYskw1VY2. Gu, A., Dao, T., Ermon, S., Rudra, A., and R ´e, C. HiPPO: recurrent memory with optimal polynomial pro- jections. InProceedings of the 34th International Con- ference on Neural Information Processing Systems, NIPS ’20, pp. 1474–1487, Red Hook, NY , USA, December

work page
[8]

ISBN 978-1-7138-2954-

Curran Associates Inc. ISBN 978-1-7138-2954-

work page
[9]

Gu, A., Goel, K., and Re, C

URL https://dl.acm.org/doi/10.5555/ 3495724.3495849. Gu, A., Goel, K., and Re, C. Efficiently Modeling Long Sequences with Structured State Spaces. October

work page arXiv
[10]

Gutteridge, B., Dong, X., Bronstein, M., and Di Giovanni, F

URL https://openreview.net/forum? id=uYLFoz1vlAC. Gutteridge, B., Dong, X., Bronstein, M., and Di Giovanni, F. DRew: dynamically rewired message passing with delay. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofICML’23, pp. 12252– 12267, Honolulu, Hawaii, USA, July 2023. JMLR.org. Heilig, S., Gravina, A., Trenta, A.,...

work page 2023
[11]

Huang, Y ., Miao, S., and Li, P

URL https://openreview.net/forum? id=03EkqSCKuO. Huang, Y ., Miao, S., and Li, P. What Can We Learn from State Space Models for Machine Learning on Graphs? October 2024. URL https://openreview.net/ forum?id=xAM9VaXZnY. Keriven, N. Not too little, not too much: a theoretical analysis of graph (over)smoothing. InProceedings of the 36th International Confere...

work page 2024
[12]

Ma, L., Lin, C., Lim, D., Romero-Soriano, A., Dokania, P

URL https://openreview.net/forum? id=SJU4ayYgl. Ma, L., Lin, C., Lim, D., Romero-Soriano, A., Dokania, P. K., Coates, M., Torr, P. H., and Lim, S.-N. Graph inductive biases in transformers without message passing. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofICML’23, pp. 23321– 23337, Honolulu, Hawaii, USA, July 202...

work page doi:10.1109/tnn.2008 2023
[13]

Perozzi, B., Al-Rfou, R., and Skiena, S

URL https://openreview.net/forum? id=64HdQKnyTc. Perozzi, B., Al-Rfou, R., and Skiena, S. DeepWalk: on- line learning of social representations. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’14, pp. 701– 710, New York, NY , USA, August 2014. Association for Computing Machinery. ISBN 978-1-4503-2...

work page doi:10.1145/2623330.2623732 2014
[14]

Scarselli, F., Gori, M., Tsoi, A

URL https://proceedings.mlr.press/ v162/rusch22a.html. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The Graph Neural Network Model. IEEE Transactions on Neural Networks, 20(1):61–80, January 2009. ISSN 1941-0093. doi: 10.1109/TNN. 2008.2005605. URL https://ieeexplore.ieee. org/document/4700287. Stoll, T., M ¨uller, L., and Mo...

work page doi:10.1109/tnn 2009
[15]

T¨onshoff, J., Ritzert, M., Rosenbluth, E., and Grohe, M

URL https://openreview.net/forum? id=vgXnEyeWVY. T¨onshoff, J., Ritzert, M., Rosenbluth, E., and Grohe, M. Where Did the Gap Go? Reassessing the Long-Range Graph Benchmark.Transactions on Machine Learning Research, January 2024. ISSN 2835-8856. URL https: //openreview.net/forum?id=Nm0WX86sKv. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,...

work page 2024
[16]

∂s(k) v ∂xw # e . X u∈V eT

URL https://openreview.net/forum? id=ryGs6iA5Km. Xu, K., Li, C., Tian, Y ., Sonobe, T., Kawarabayashi, K.-i., and Jegelka, S. Representation Learning on Graphs with Jumping Knowledge Networks. InPro- ceedings of the 35th International Conference on Ma- chine Learning, pp. 5453–5462. PMLR, July 2018. URL https://proceedings.mlr.press/v80/ xu18c.html. ISSN:...

work page 2018