arxiv: 2605.01136 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.SI· math.SP· stat.ML

Recognition: unknown

Spectral Graph Sparsification Preserves Representation Geometry in Graph Neural Networks

Sanjukta Krishnagopal

Pith reviewed 2026-05-09 19:08 UTC · model grok-4.3

classification 💻 cs.LG cs.SImath.SPstat.ML

keywords spectral graph sparsificationgraph neural networksrepresentation geometrypolynomial filtersGram matrix preservationembedding stabilitytraining trajectory stability

0 comments

The pith

Any ε-spectral sparsifier perturbs polynomial-filter GNN hidden representations and Gram matrices by only O(ε).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that spectral graph sparsification, which thins edges while nearly preserving Laplacian quadratic forms, also keeps the internal geometry of representations learned by polynomial-filter GNNs nearly intact. A reader would care because GNN embeddings are frequently used for downstream interpretability tasks such as clustering, visualization, or measuring class separation, and if distances, class means, and covariances stay stable then one can safely drop most edges for speed without breaking those uses. The argument chains approximation bounds from the sparsified Laplacian through polynomial filters to multilayer activations and their inner-product matrices. It further shows that, under smoothness and boundedness conditions on the loss, gradient descent produces weight paths on the sparse graph that remain close to the dense-graph paths throughout finite training. Experiments on synthetic and real graphs confirm that Gram-matrix and trajectory divergence stay small even at high sparsification ratios.

Core claim

For polynomial-filter GNNs, any ε-spectral sparsifier induces O(ε) perturbations in polynomial graph filters, multilayer hidden representations, and their Gram matrices. These guarantees imply stability of squared pairwise distances, class means, and covariance structure in embedding space. We further establish finite-time training stability: under smoothness and boundedness assumptions, gradient descent on dense and sparsified graphs produces weight trajectories whose separation grows at most proportionally to the sparsification distortion. Empirically, effective-resistance sparsification validates the predicted perturbation chain on synthetic graphs and preserves hidden representation Gram

What carries the argument

The ε-spectral sparsifier, which approximates all Laplacian quadratic forms within a (1+ε) multiplicative factor, carries the argument by supplying uniform bounds that propagate through the polynomial filter to the network activations and their Gram matrices.

If this is right

Squared pairwise distances in the learned embedding space change by at most O(ε).
Class means and covariance matrices computed from the embeddings remain stable up to O(ε).
Gradient-descent weight trajectories on the sparsified graph stay within a distance proportional to ε of the dense-graph trajectories for any finite number of steps.
Gram-matrix preservation on real data correlates with preservation of neighborhood structure and class-centroid locations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same perturbation chaining could be tested on attention-based or message-passing GNNs to see whether the O(ε) guarantee extends beyond polynomial filters.
If the smoothness assumptions hold in practice, sparsification becomes a practical way to accelerate training while keeping embeddings usable for transfer or explanation tasks.
One could measure how the observed Gram-matrix divergence scales with ε on larger graphs to check whether the theoretical constants are tight enough for real deployment.
The link between Gram preservation and class-centroid stability suggests sparsified graphs might serve as simplified proxies for interpreting what a trained GNN has learned about its input data.

Load-bearing premise

The finite-time training-stability bound requires the loss and network to be smooth and bounded; without those conditions the separation between dense and sparsified trajectories is uncontrolled.

What would settle it

An explicit counter-example in which an ε-spectral sparsifier produces more than O(ε) deviation in the Gram matrix of hidden representations for a simple polynomial GNN on a small graph would disprove the perturbation claim.

Figures

Figures reproduced from arXiv: 2605.01136 by Sanjukta Krishnagopal.

**Figure 1.** Figure 1: Controlled validation of representation stability. We measure relative polynomial-filter error, relative hidden-representation error, and relative hidden-Gram error for effective resistance sparsified weighted SBM and geometric weighted k-NN graphs. The graph operator is the scaled combinatorial Laplacian L¯ = L/∥L∥2, and the polynomial filter is p(L¯) = I − 0.6L¯ + 0.15L¯2 . Across both graph families, al… view at source ↗

**Figure 2.** Figure 2: Training-trajectory perturbation under graph sparsification. Relative parametergap trajectories between models trained on the dense graph and on effective-resistance sparsifiers for one-layer and two-layer deterministic polynomial-filter GNNs on SBM and geometric graph families. Both experiments use the same scaled polynomial-Laplacian graph shift as [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Hidden Gram preservation predicts neighborhood preservation on real data. For each sparsification budget and dataset, we repeat the sparsifier with three random seeds and compute both hidden Gram distortion and mean 20-NN overlap on held-out test nodes. Horizontal error bars show one standard deviation of hidden Gram distortion, and vertical error bars show one standard deviation of 20-NN overlap. Across F… view at source ↗

**Figure 4.** Figure 4: Class-centroid geometry under sparsification. Within each dataset, both dense and sparse centroids are projected into the same PCA basis. Small centroid displacements indicate that sparsification preserves class geometry, consistent with the class-statistics stability in Corollary 2. basis. Across all three datasets, sparse centroids remain close to their dense counterparts, indicating that sparsification … view at source ↗

**Figure 5.** Figure 5: Procrustes-aligned dense and sparse embeddings. Dense and sparse hidden embeddings on held-out nodes after projection to two principal components and optimal orthogonal alignment of the sparse embedding to the dense one. The substantial overlap after alignment indicates that much of the dense–sparse discrepancy is explained by an approximately rigid transformation rather than severe deformation of the lear… view at source ↗

**Figure 6.** Figure 6: Teacher–student consequences of graph distortion. On synthetic SBM and geometric graphs, larger empirical graph distortion produces larger teacher KL divergence, and larger KL is associated with lower KD student accuracy. This shows that graph distortion affects not only hidden geometry but also the transferred soft decision structure. Error bars denote one standard deviation across four repeated runs. 21 … view at source ↗

read the original abstract

Spectral graph sparsification is a classical tool for reducing graph complexity while preserving Laplacian quadratic forms. In graph neural networks (GNNs), sparsification is often used to accelerate computation while maintaining predictive performance. In this work, we study a complementary representation-level question: does sparsification preserve the geometry of learned embeddings? For polynomial-filter GNNs, we prove that any $\epsilon$-spectral sparsifier induces $O(\epsilon)$ perturbations in polynomial graph filters, multilayer hidden representations, and their Gram matrices. These guarantees imply stability of squared pairwise distances, class means, and covariance structure in embedding space. We further establish finite-time training stability: under smoothness and boundedness assumptions, gradient descent on dense and sparsified graphs produces weight trajectories whose separation grows at most proportionally to the sparsification distortion. Empirically, effective-resistance sparsification validates the predicted perturbation chain on synthetic graphs and preserves hidden representation geometry on real datasets. In our experiments, the gram matrix and training dynamics show low divergence even under substantial sparsification, consistent with the predicted stability under spectral sparsification. Hidden Gram preservation strongly predicts neighborhood preservation and class-centroid stability across FashionMNIST, Cora, and Paul15. Together, these results show that spectral sparsification preserves not only graph operators, but also the representation geometry that supports downstream use of GNN embeddings for interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives O(ε) bounds on how spectral sparsification perturbs GNN hidden representations and Gram matrices for polynomial filters, but the training stability claim stays conditional on unverified smoothness assumptions.

read the letter

The main takeaway is that this paper derives explicit O(ε) bounds showing spectral sparsification preserves the geometry of GNN hidden representations and their Gram matrices for polynomial filters. This goes beyond the usual accuracy checks and could help justify sparsification for interpretability tasks. The work does well by connecting spectral approximation directly to perturbations in filters, multilayer states, and embedding distances. The analysis is straightforward and builds on established sparsification results. Experiments back this up on synthetic graphs and real ones like Cora, where Gram matrix preservation correlates with stable neighborhoods and class means. The soft spot is in the training stability section. The finite-time result assumes smoothness and bounded gradients on the loss and network, but there's no verification that these hold for the objectives used on the datasets. That makes the guarantee for learned embeddings during training conditional rather than unconditional. The empirical validation stays mostly at the fixed-weight level. The paper is aimed at graph ML researchers working on scalable GNNs and embedding preservation. Anyone needing theoretical support for using sparsifiers without messing up downstream geometry will find value here. It has enough substance for peer review, particularly the perturbation bounds. I would send it to referees and ask them to check the derivations and the assumption gap.

Referee Report

2 major / 1 minor

Summary. The paper claims that for polynomial-filter GNNs, any ε-spectral sparsifier induces O(ε) perturbations in the polynomial graph filters, multilayer hidden representations, and their Gram matrices. These bounds imply stability of squared pairwise distances, class means, and covariance structure in the embedding space. The work further establishes finite-time training stability: under smoothness and boundedness assumptions on the loss and network, gradient descent trajectories on dense and sparsified graphs remain close with separation growing proportionally to the sparsification distortion. Empirical results on synthetic graphs and real datasets (FashionMNIST, Cora, Paul15) show low divergence in Gram matrices and training dynamics, with hidden Gram preservation predicting neighborhood and class-centroid stability.

Significance. If the perturbation analysis holds, the work supplies a useful theoretical bridge between classical spectral sparsification and GNN representation geometry, explaining why sparsified models often retain predictive performance and supporting interpretability applications that rely on stable embeddings. The direct derivation from spectral approximation properties (without fitted quantities) and the empirical link between Gram-matrix fidelity and downstream geometric invariants are strengths. The finite-time training result is more conditional and would benefit from explicit checks on the cited objectives.

major comments (2)

[finite-time training stability result (abstract and theorem)] The finite-time training stability result (stated in the abstract and developed in the corresponding theorem) requires smoothness and boundedness assumptions on the loss and network so that gradient-descent trajectories on dense and sparsified graphs separate at most proportionally to ε. No verification is supplied that cross-entropy or typical GNN objectives satisfy the needed Lipschitz smoothness or gradient boundedness on FashionMNIST, Cora, or Paul15; without such checks the trajectory-separation claim remains conditional and is load-bearing for the conclusion that learned embeddings themselves stay close.
[perturbation chain proof (abstract)] The abstract asserts a proof that ε-spectral sparsification induces O(ε) perturbations through the full chain of polynomial filters, hidden representations, and Gram matrices. Without the explicit derivation, layer-wise constants, or statement of any additional restrictions (e.g., on filter degree or network depth), it is not possible to confirm that the O(ε) bound propagates uniformly rather than accumulating or requiring further hypotheses.

minor comments (1)

[abstract] The abstract could more explicitly separate the fixed-weight representation guarantees from the training-dynamics result to clarify which claims are unconditional.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of the work in bridging spectral sparsification with GNN representation geometry. We address the two major comments below.

read point-by-point responses

Referee: [finite-time training stability result (abstract and theorem)] The finite-time training stability result (stated in the abstract and developed in the corresponding theorem) requires smoothness and boundedness assumptions on the loss and network so that gradient-descent trajectories on dense and sparsified graphs separate at most proportionally to ε. No verification is supplied that cross-entropy or typical GNN objectives satisfy the needed Lipschitz smoothness or gradient boundedness on FashionMNIST, Cora, or Paul15; without such checks the trajectory-separation claim remains conditional and is load-bearing for the conclusion that learned embeddings themselves stay close.

Authors: We agree that the finite-time training stability theorem is conditional on the smoothness and boundedness assumptions. These assumptions are standard in analyses of gradient descent for neural networks and are satisfied by the cross-entropy loss with ReLU activations and bounded weights, as the gradients remain bounded under the compact parameter space typically considered. In the revised version, we have added a new paragraph in Section 4.3 discussing why these conditions hold for the GNN models and loss functions used in our experiments on FashionMNIST, Cora, and Paul15. This includes references to standard results on Lipschitz continuity of such objectives. We believe this addresses the concern while keeping the result general. revision: yes
Referee: [perturbation chain proof (abstract)] The abstract asserts a proof that ε-spectral sparsification induces O(ε) perturbations through the full chain of polynomial filters, hidden representations, and Gram matrices. Without the explicit derivation, layer-wise constants, or statement of any additional restrictions (e.g., on filter degree or network depth), it is not possible to confirm that the O(ε) bound propagates uniformly rather than accumulating or requiring further hypotheses.

Authors: The abstract is intended as a concise summary. The complete perturbation analysis, including the layer-wise propagation of the O(ε) bound, is provided in Theorem 3.1 and its proof in Section 3 of the manuscript. The constants in the O(ε) bound explicitly depend on the polynomial degree K, the network depth L, the maximum degree of the graph, and the Lipschitz constants of the activation functions; these dependencies are stated in the theorem statement. The bound does not accumulate exponentially but remains linear in ε due to the inductive structure of the proof. To improve clarity, we have revised the abstract to briefly mention that the constants depend on network depth and filter degree. We have also added a remark after the theorem highlighting the absence of additional restrictions beyond those listed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained perturbation analysis

full rationale

The paper's central claims consist of mathematical proofs that any ε-spectral sparsifier induces O(ε) perturbations in polynomial filters, hidden representations, and Gram matrices for fixed-weight polynomial-filter GNNs, plus a conditional finite-time training stability result under explicit smoothness and boundedness assumptions on the loss and network. These follow directly from standard spectral approximation properties and perturbation theory without reducing to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The training-stability component is explicitly conditional rather than tautological, and no ansatz smuggling, uniqueness theorems from prior author work, or renaming of known results is present in the derivation chain. The results are therefore independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proof relies on standard properties of spectral sparsifiers and polynomial filters; the training-stability part adds smoothness and boundedness assumptions that are not derived in the paper.

axioms (1)

domain assumption Smoothness and boundedness of the loss and network weights during gradient descent
Invoked to bound the separation of training trajectories on dense versus sparsified graphs.

pith-pipeline@v0.9.0 · 5546 in / 1203 out tokens · 27430 ms · 2026-05-09T19:08:17.870298+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 1 canonical work pages · 1 internal anchor

[1]

and Teng, Shang-Hua , title =

Spielman, Daniel A. and Teng, Shang-Hua , title =. SIAM Journal on Computing , volume =. 2011 , doi =

2011
[2]

and Srivastava, Nikhil , title =

Spielman, Daniel A. and Srivastava, Nikhil , title =. SIAM Journal on Computing , volume =. 2011 , doi =

2011
[3]

and Ying, Rex and Leskovec, Jure , title =

Hamilton, William L. and Ying, Rex and Leskovec, Jure , title =. IEEE Data Engineering Bulletin , volume =
[4]

and Welling, Max , title =

Kipf, Thomas N. and Welling, Max , title =. International Conference on Learning Representations , year =
[5]

and Ying, Zhitao and Leskovec, Jure , title =

Hamilton, William L. and Ying, Zhitao and Leskovec, Jure , title =. Advances in Neural Information Processing Systems 30 , pages =
[6]

Papyan, Vardan and Han, X. Y. and Donoho, David L. , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =

2020
[7]

Advances in Neural Information Processing Systems 36 , pages =

Kothapalli, Vignesh and Tirer, Tom and Bruna, Joan , title =. Advances in Neural Information Processing Systems 36 , pages =
[8]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, Han and Rasul, Kashif and Vollgraf, Roland , title =. arXiv preprint arXiv:1708.07747 , year =

work page internal anchor Pith review arXiv
[9]

AI Magazine , volume =

Sen, Prithviraj and Namata, Galileo and Bilgic, Mustafa and Getoor, Lise and Gallagher, Brian and Eliassi-Rad, Tina , title =. AI Magazine , volume =
[10]

Ruiz, Luana and Chamon, Luiz F. O. and Ribeiro, Alejandro , title =. IEEE Transactions on Signal Processing , volume =. 2023 , doi =

2023
[11]

Proceedings of the IEEE , volume =

Ruiz, Luana and Gama, Fernando and Ribeiro, Alejandro , title =. Proceedings of the IEEE , volume =. 2021 , doi =

2021
[12]

Proceedings of the 40th International Conference on Machine Learning , series =

Krishnagopal, Sanjukta and Ruiz, Luana , title =. Proceedings of the 40th International Conference on Machine Learning , series =
[13]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =

Ding, Haipeng and Wei, Zhewei and Ye, Yuhang , title =. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =. 2025 , doi =

2025
[14]

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , booktitle =

Defferrard, Micha. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , booktitle =
[15]

ICASSP 2020 -- 2020 IEEE International Conference on Acoustics, Speech and Signal Processing , pages =

Gama, Fernando and Bruna, Joan and Ribeiro, Alejandro , title =. ICASSP 2020 -- 2020 IEEE International Conference on Acoustics, Speech and Signal Processing , pages =. 2020 , doi =

2020
[16]

Cell , volume =

Paul, Franziska and Arkin, Ya'ara and Giladi, Amir and Jaitin, Diego Adhemar and Kenigsberg, Ephraim and Keren-Shaul, Hadas and Winter, Deborah and Lara-Astiaso, David and Gury, Meital and Weiner, Assaf and David, Eyal and Cohen, Nadav and Lauridsen, Felicia Kathrine Bratt and Haas, Simon and Schlitzer, Andreas and Mildner, Alexander and Ginhoux, Florent ...

2015
[17]

Signal Processing , volume =

Gao, Zhichao and Isufi, Elvin and Ribeiro, Alejandro , title =. Signal Processing , volume =. 2021 , doi =

2021
[18]

Predict then Propagate: Graph Neural Networks Meet Personalized

Klicpera, Johannes and Bojchevski, Aleksandar and G. Predict then Propagate: Graph Neural Networks Meet Personalized. International Conference on Learning Representations , year =
[19]

International Conference on Learning Representations , year =

Rong, Yu and Huang, Wenbing and Xu, Tingyang and Huang, Junzhou , title =. International Conference on Learning Representations , year =
[20]

, title =

Zeng, Hanqing and Zhou, Hongkuan and Srivastava, Ajitesh and Kannan, Rajgopal and Prasanna, Viktor K. , title =. International Conference on Learning Representations , year =
[21]

International Conference on Learning Representations , year =

Zhao, Lingxiao and Akoglu, Leman , title =. International Conference on Learning Representations , year =
[22]

Graph Attention Networks , booktitle =

Veli. Graph Attention Networks , booktitle =. 2018 , url =

2018
[23]

International Conference on Learning Representations , year =

Chen, Jie and Ma, Tengfei and Xiao, Cao , title =. International Conference on Learning Representations , year =
[24]

International Conference on Learning Representations , year =

Xu, Keyulu and Hu, Weihua and Leskovec, Jure and Jegelka, Stefanie , title =. International Conference on Learning Representations , year =
[25]

International Conference on Learning Representations , year =

Oono, Kenta and Suzuki, Taiji , title =. International Conference on Learning Representations , year =
[26]

International Conference on Learning Representations , year =

Bruna, Joan and Zaremba, Wojciech and Szlam, Arthur and LeCun, Yann , title =. International Conference on Learning Representations , year =
[27]

, title =

Wu, Felix and Souza, Amauri and Zhang, Tianyi and Fifty, Christopher and Yu, Tao and Weinberger, Kilian Q. , title =. Proceedings of the 36th International Conference on Machine Learning , series =