pith. machine review for the scientific record. sign in

arxiv: 2605.11142 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

Eftychia Makri, Katerina Mamali, Konstantinos Tsirkas, Leandros Tassiulas, Nicholas A. Christakis, Nikolaos Nakis, Panagiotis Promponas

Pith reviewed 2026-05-13 06:19 UTC · model grok-4.3

classification 💻 cs.LG
keywords graph representation learninglatent embeddingsspectral occupancyeffective rankmodel capacitylink predictionpositive semidefinite kernel
0
0 comments X

The pith

The spectrum of a learned positive semidefinite kernel measures the effective capacity of latent graph models instead of their nominal rank.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In graph representation learning the latent dimension is fixed before training as a hyperparameter, yet the learned embeddings are identifiable only up to rotation and rescaling so the nominal rank rarely matches the quantity that actually governs behavior. The paper replaces that rank with the trace-normalized spectrum of the positive semidefinite kernel induced by the embeddings; the Shannon effective rank of those normalized eigenvalues then serves as a comparable summary of realized capacity across fits. This effective rank becomes a controllable training coordinate: a scalar parameter shapes it during optimization and bisection targets any desired value up to the rank cap, with supporting proofs of local regularity and monotonicity. The approach yields visible performance-capacity frontiers on collaboration, social, biological, and infrastructure networks while remaining competitive with strong link-prediction baselines and supplying aligned lower-capacity spectral prefixes of the same fitted model. A reader would care because capacity shifts from a brittle preset to an observable and adjustable property of the trained model.

Core claim

By extracting the spectrum of the learned positive semidefinite kernel, normalizing its eigenvalues by the trace, and summarizing them with Shannon effective rank, the method treats realized capacity as a property of the fitted model rather than a hyperparameter of training; a single scalar during optimization together with bisection search targets any desired effective dimension within the rank cap, and the resulting spectral prefixes supply aligned lower-capacity views of the same embeddings.

What carries the argument

The trace-normalized eigenvalue spectrum of the positive semidefinite kernel induced by the latent embeddings, with its Shannon effective rank serving as both capacity summary and training-time coordinate.

If this is right

  • Performance-capacity frontiers become traceable for collaboration, social, biological, and infrastructure networks.
  • Spectral prefixes supply aligned lower-capacity representations of any fitted model.
  • A scalar parameter plus bisection search controls realized dimension at training time.
  • Competitive link-prediction accuracy is maintained in the overparameterized regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same spectral-occupancy view could be applied to non-graph embedding models to expose capacity without changing the architecture.
  • Hyperparameter search over latent dimension could be replaced by post-training adjustment of the target effective rank.
  • Monotonicity of the realized-dimension profile may allow more efficient search procedures in related representation-learning settings.

Load-bearing premise

The spectrum of the learned positive semidefinite kernel after trace normalization accurately captures the quantity that governs model behavior and remains comparable across different fits.

What would settle it

An experiment that trains two models to the same trace-normalized spectrum (hence same effective rank) but with different nominal ranks and checks whether their link-prediction performance is statistically identical on held-out edges.

Figures

Figures reproduced from arXiv: 2605.11142 by Eftychia Makri, Katerina Mamali, Konstantinos Tsirkas, Leandros Tassiulas, Nicholas A. Christakis, Nikolaos Nakis, Panagiotis Promponas.

Figure 1
Figure 1. Figure 1: Overview of SPECTRA. A rank-capped factor L induces a trace-normalized PSD kernel K(L) used in the edge model. The normalized spectrum of K(L) defines spectral occupancy. eigenvalues, and yields Eckart–Young–Mirsky-optimal low-rank PSD summaries [14]. Shannon effective rank has appeared as a signal-processing spectral-entropy measure [54] and as the Vendi score for diversity [16]; spectral entropy also dia… view at source ↗
Figure 2
Figure 2. Figure 2: Capacity frontiers across rank caps. Each dataset is swept over η ∈ [−0.25, 0.25] at r ∈ {64, 128, 256}; curves show mean±1σ across three seeds. Upper panels show achieved dspec versus η, lower panels show test AUC versus achieved dspec. Top: saturated datasets. Bottom: rank-cap-binding datasets. Legends report per-rank peak coordinates. 4 Experiments & Results We evaluate SPECTRA against representative la… view at source ↗
Figure 3
Figure 3. Figure 3: Adjacency matrices reordered by spectral assignment [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of Theorem 6. (a) Solution branches in Θe. Near a reference point θ ∗ (η0) satisfying the Assumptions 4 and 5, the optimum traces a smooth curve as η varies, with each θ ∗ (η) a strict local minimum of Fη on its slice. Fη may have additional local minima, each generating its own local branch under the same conditions. (b) Capacity profiles dspec(η) along the branches. The tracked branch is smo… view at source ↗
Figure 5
Figure 5. Figure 5: Structural events along the η path at two rank caps, on four representative datasets spanning the four structural families. Top row: r=128. Bottom row: r=256. Columns: citation (ca-grqc), infrastructure (inf-power), social (socfb-American75), biological (bio-grid-worm). Each panel shows dspec(η) vs η on log scale, mean ±1σ across three seeds. Vertical line: η=0. Event markers indicate probes where a struct… view at source ↗
read the original abstract

Graph representation learning has become a standard approach for analyzing networked data, with latent embeddings widely used for link prediction, community detection, and related tasks. Yet a basic design choice, the latent dimension, is still treated as a brittle hyperparameter, fixed before training and tuned by held-out performance. Learned factors are also identifiable only up to rotation and rescaling, so the nominal rank rarely coincides with the quantity that governs model behavior. We propose Spectral Prefix Extraction and Capacity-Targeted Representation Analysis (Spectra), which replaces rank as the unit of analysis with the spectrum of a learned positive semidefinite kernel, trace-normalized so that spectra are comparable across fits. The normalized eigenvalues form a distribution on the simplex, and their Shannon effective rank acts both as a summary of learned capacity and as a controllable training-time coordinate: a single scalar shapes this realized dimension during training, and bisection targets any desired value within the rank cap. To theoretically support that, we show local regularity and monotonicity of the realized-dimension profile. Across collaboration, social, biological, and infrastructure networks, Spectra traces performance--capacity frontiers that make the trade-off between predictive accuracy and realized dimension visible. It performs competitively with strong link-prediction baselines, yields aligned lower-capacity views of the same fitted model through spectral prefixes, and provides a principled handle on capacity in the overparameterized regime. Capacity thus becomes a property of the fitted model rather than a hyperparameter of the training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Spectra (Spectral Prefix Extraction and Capacity-Targeted Representation Analysis) for latent graph models. It replaces nominal rank with the spectrum of a learned positive semidefinite kernel after trace normalization, uses the Shannon effective rank of the resulting simplex distribution as a summary of realized capacity, and shows that a single scalar plus bisection can target any desired value within the rank cap. Local regularity and monotonicity of the realized-dimension profile are claimed as theoretical support. Experiments on collaboration, social, biological, and infrastructure networks demonstrate competitive link-prediction performance, visible performance-capacity frontiers, and aligned lower-capacity views via spectral prefixes.

Significance. If the normalization argument and the local regularity/monotonicity results hold, the work would make capacity a controllable property of the fitted model rather than a pre-training hyperparameter. This could clarify trade-offs in the overparameterized regime and supply a principled coordinate for analyzing graph representation learning across tasks.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (theoretical support): the central claim that trace normalization makes spectra comparable and that the resulting effective rank governs downstream behavior requires an argument that the discarded kernel scale does not couple to the link-prediction objective or generalization; no such argument or counter-example analysis is supplied, leaving the separation of capacity from hyperparameter unverified.
  2. [§5] §5 (experiments): performance-capacity frontiers are reported without error bars, repeated runs, or explicit data-split details, so it is impossible to assess whether the claimed competitiveness and monotonic trade-offs are statistically reliable.
minor comments (2)
  1. [§3] Notation for the scalar that controls realized dimension and for the bisection procedure should be introduced with an explicit equation in §3.
  2. [§5] The manuscript would benefit from a short table summarizing the networks, their sizes, and the range of targeted effective ranks used in the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (theoretical support): the central claim that trace normalization makes spectra comparable and that the resulting effective rank governs downstream behavior requires an argument that the discarded kernel scale does not couple to the link-prediction objective or generalization; no such argument or counter-example analysis is supplied, leaving the separation of capacity from hyperparameter unverified.

    Authors: We agree that the manuscript would be strengthened by an explicit argument showing that the kernel scale factor is decoupled from the link-prediction objective. In Spectra, trace normalization is performed after fitting so that the spectrum lies on the probability simplex; the link-prediction loss depends on the normalized inner products, which remain unchanged under uniform rescaling of the kernel. Nevertheless, to meet the referee's request we will insert a short subsection in the revised §4 that formalizes this invariance, supplies a brief counter-example sketch, and clarifies why the effective rank (rather than nominal rank or scale) is the relevant capacity coordinate. revision: yes

  2. Referee: [§5] §5 (experiments): performance-capacity frontiers are reported without error bars, repeated runs, or explicit data-split details, so it is impossible to assess whether the claimed competitiveness and monotonic trade-offs are statistically reliable.

    Authors: We accept that the current experimental presentation lacks the statistical detail needed for rigorous assessment. In the revised version we will (i) report mean performance with standard-error bars computed over at least five independent runs using distinct random seeds, (ii) state the precise train/validation/test splits (including any temporal or random partitioning) for every dataset, and (iii) confirm that the observed monotonicity of the performance-capacity curves holds across these repetitions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained.

full rationale

The paper defines Spectra via trace normalization of the learned kernel spectrum to obtain a simplex distribution whose Shannon effective rank serves as the capacity coordinate. This effective rank is computed post-fit from the eigenvalues, while the scalar control and bisection targeting operate as an independent external mechanism to achieve desired values. The claimed local regularity and monotonicity of the realized-dimension profile are presented as theoretical properties derived from the optimization setup rather than tautological re-statements of the fitted values themselves. No load-bearing step reduces a prediction, uniqueness claim, or ansatz to a self-citation chain or to the input data by construction; the separation of capacity from hyperparameter rests on the external scalar and empirical performance-capacity frontiers, keeping the chain independent of its own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions about positive semidefinite kernels and identifiability up to rotation, plus one controllable scalar whose value is chosen to target a desired effective rank.

free parameters (1)
  • scalar controlling realized dimension
    Single scalar used during training to shape the effective dimension, targeted by bisection.
axioms (2)
  • domain assumption Learned factors are identifiable only up to rotation and rescaling
    Invoked to explain why nominal rank fails to match governing capacity.
  • standard math The kernel matrix is positive semidefinite
    Required for the eigenvalue spectrum to be well-defined and non-negative.

pith-pipeline@v0.9.0 · 5592 in / 1254 out tokens · 37592 ms · 2026-05-13T06:19:07.769795+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 2 internal anchors

  1. [1]

    Learning role-based graph embeddings.arXiv preprint arXiv:1802.02896, 2018

    Nesreen K Ahmed, Ryan Rossi, John Boaz Lee, Theodore L Willke, Rong Zhou, Xiangnan Kong, and Hoda Eldardiry. Learning role-based graph embeddings.arXiv preprint arXiv:1802.02896, 2018

  2. [2]

    Induction of social contagion for diverse outcomes in structured experiments in isolated villages.Science (New York, N.Y.), 384:eadi5147, 05 2024

    Edoardo Airoldi and Nicholas Christakis. Induction of social contagion for diverse outcomes in structured experiments in isolated villages.Science (New York, N.Y.), 384:eadi5147, 05 2024. doi: 10.1126/science.adi5147

  3. [3]

    Mixed membership stochastic blockmodels, 2007

    Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership stochastic blockmodels, 2007. URLhttps://arxiv.org/abs/0705.4485

  4. [4]

    Christakis

    Marcus Alexander, Laura Forastiere, Swati Gupta, and Nicholas A. Christakis. Algorithms for seeding social networks can enhance the adoption of a public health intervention in urban india. Proceedings of the National Academy of Sciences, 119(30):e2120742119, 2022. doi: 10.1073/ pnas.2120742119. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2120742119

  5. [5]

    Implicit regularization in deep matrix factorization.Advances in neural information processing systems, 32, 2019

    Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in neural information processing systems, 32, 2019

  6. [6]

    Statistical inference on random dot product graphs: a survey.Journal of Machine Learning Research, 18 (226):1–92, 2018

    Avanti Athreya, Donniell E Fishkind, Minh Tang, Carey E Priebe, Youngser Park, Joshua T V ogelstein, Keith Levin, Vince Lyzinski, Yichen Qin, and Daniel L Sussman. Statistical inference on random dot product graphs: a survey.Journal of Machine Learning Research, 18 (226):1–92, 2018. 10

  7. [7]

    Bartlett, Philip M

    Peter L. Bartlett, Philip M. Long, Gábor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression.Proceedings of the National Academy of Sciences, 117(48):30063–30070, April 2020. ISSN 1091-6490. doi: 10.1073/pnas.1907378117. URL http://dx.doi.org/ 10.1073/pnas.1907378117

  8. [8]

    Can network theory-based targeting increase technology adoption?American Economic Review, 111:1918–1943, 06 2021

    Lori Beaman, Ariel Benyishay, and Jeremy Magruder. Can network theory-based targeting increase technology adoption?American Economic Review, 111:1918–1943, 06 2021. doi: 10.1257/aer.20200295

  9. [9]

    Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019

    Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019. ISSN 1091-6490. doi: 10.1073/pnas.1903070116. URLhttp://dx.doi.org/10.1073/pnas.1903070116

  10. [10]

    Exact matrix completion via convex optimization

    Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012

  11. [11]

    Grarep: Learning graph representations with global structural information

    Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. InProceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 891–900, 2015

  12. [12]

    Network cross-validation for determining the number of communities in network data.Journal of the American Statistical Association, 113(521):241–251, 2018

    Kehui Chen and Jing Lei. Network cross-validation for determining the number of communities in network data.Journal of the American Statistical Association, 113(521):241–251, 2018

  13. [13]

    Low-rank matrix approximation for neural network com- pression

    Kalyan Cherukuri and Aarav Lala. Low-rank matrix approximation for neural network com- pression. In2025 10th International Conference on Machine Learning Technologies (ICMLT), pages 166–170. IEEE, 2025

  14. [14]

    The approximation of one matrix by another of lower rank

    Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1:211–218, 1936. URL https://api.semanticscholar.org/CorpusID: 10163399

  15. [15]

    Gradient-based spectral embeddings of random dot product graphs.IEEE Transactions on Signal and Information Processing over Networks, 10:1–16, 2023

    Marcelo Fiori, Bernardo Marenco, Federico Larroca, Paola Bermolen, and Gonzalo Mateos. Gradient-based spectral embeddings of random dot product graphs.IEEE Transactions on Signal and Information Processing over Networks, 10:1–16, 2023

  16. [16]

    Preprint, arXiv:2210.02410

    Dan Friedman and Adji Bousso Dieng. The vendi score: A diversity evaluation metric for machine learning.arXiv preprint arXiv:2210.02410, 2022

  17. [17]

    node2vec: Scalable Feature Learning for Networks

    Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016

  18. [18]

    Principled approach to the selection of the embedding dimension of networks.Nature Communications, 12(1):3772, 2021

    Weiwei Gu, Aditya Tandon, Yong-Yeol Ahn, and Filippo Radicchi. Principled approach to the selection of the embedding dimension of networks.Nature Communications, 12(1):3772, 2021

  19. [19]

    Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017

    Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Sre- bro. Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017

  20. [20]

    Cohort profile: The national longitudinal study of adolescent to adult health (add health).International journal of epidemiology, 48, 06 2019

    Kathleen Harris, Carolyn Halpern, Eric Whitsel, Jon Hussey, Ley Killeya-Jones, Joyce Tabor, and Sarah Dean. Cohort profile: The national longitudinal study of adolescent to adult health (add health).International journal of epidemiology, 48, 06 2019. doi: 10.1093/ije/dyz115

  21. [21]

    Modeling homophily and stochastic equivalence in symmetric relational data

    Peter Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. Advances in neural information processing systems, 20, 2007

  22. [22]

    Latent space approaches to social network analysis.Journal of the american Statistical association, 97(460):1090–1098, 2002

    Peter D Hoff, Adrian E Raftery, and Mark S Handcock. Latent space approaches to social network analysis.Journal of the american Statistical association, 97(460):1090–1098, 2002

  23. [23]

    Network signflip parallel analysis for selecting the embedding dimension.arXiv preprint arXiv:2509.05722, 2025

    David Hong and Joshua Cape. Network signflip parallel analysis for selecting the embedding dimension.arXiv preprint arXiv:2509.05722, 2025

  24. [24]

    Stable rank and intrinsic dimension of real and complex matrices.SIAM Journal on Matrix Analysis and Applications, 46(3):1988–2007, 2025

    Ilse CF Ipsen and Arvind K Saibaba. Stable rank and intrinsic dimension of real and complex matrices.SIAM Journal on Matrix Analysis and Applications, 46(3):1988–2007, 2025. 11

  25. [25]

    Us adolescents’ friendship networks and health risk behaviors: a systematic review of studies using social network analysis and add health data

    Kwon Chan Jeon and Patricia Goodson. Us adolescents’ friendship networks and health risk behaviors: a systematic review of studies using social network analysis and add health data. PeerJ, 3:e1052, 06 2015. doi: 10.7717/peerj.1052

  26. [26]

    Jeong, S

    H. Jeong, S. P. Mason, A.-L. Barabási, and Z. N. Oltvai. Lethality and centrality in protein networks.Nature, 411(6833):41–42, May 2001. ISSN 1476-4687. doi: 10.1038/35075138. URLhttp://dx.doi.org/10.1038/35075138

  27. [27]

    Nandan Kumar Jha and Brandon Reagen. Spectral scaling laws in language models: emphhow effectively do feed-forward networks use their latent space? InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35047–35058, 2025

  28. [28]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  29. [29]

    Determinantal point processes for machine learning.Foundations and Trends® in Machine Learning, 5(2-3):123–286, December 2012

    Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning.Foundations and Trends® in Machine Learning, 5(2-3):123–286, December 2012. ISSN 1935-8245. doi: 10.1561/2200000044. URLhttp://dx.doi.org/10.1561/2200000044

  30. [30]

    Matryoshka representation learning, 2024

    Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, and Ali Farhadi. Matryoshka representation learning, 2024. URL https://arxiv.org/abs/2205. 13147

  31. [31]

    Lee.Introduction to Smooth Manifolds

    John M. Lee.Introduction to Smooth Manifolds. Springer, 2 edition, 2013

  32. [32]

    SNAP Datasets: Stanford large network dataset collection, June 2014

    Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014

  33. [33]

    Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1): 2–es, 2007

    Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1): 2–es, 2007

  34. [34]

    Measuring the intrinsic dimension of objective landscapes,

    Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes, 2018. URLhttps://arxiv.org/abs/1804.08838

  35. [35]

    Network cross-validation by edge sampling.Biometrika, 107(2):257–276, 2020

    Tianxi Li, Elizaveta Levina, and Ji Zhu. Network cross-validation by edge sampling.Biometrika, 107(2):257–276, 2020

  36. [36]

    The link prediction problem for social networks

    David Liben-Nowell and Jon Kleinberg. The link prediction problem for social networks. In CIKM, page 556–559, 2003

  37. [37]

    Graph entropy guided node embedding dimension selection for graph neural networks

    Gongxu Luo, Jianxin Li, Jianlin Su, Hao Peng, Carl Yang, Lichao Sun, Philip S Yu, and Lifang He. Graph entropy guided node embedding dimension selection for graph neural networks. arXiv preprint arXiv:2105.03178, 2021

  38. [38]

    Hm-ldm: A hybrid-membership latent distance model, 2022

    Nikolaos Nakis, Abdulkadir Çelikkanat, and Morten Mørup. Hm-ldm: A hybrid-membership latent distance model, 2022. URLhttps://arxiv.org/abs/2206.03463

  39. [39]

    Characterizing polarization in social networks using the signed relational latent distance model

    Nikolaos Nakis, Abdulkadir Celikkanat, Louis Boucherie, Christian Djurhuus, Felix Burmester, Daniel Mathias Holmelund, Monika Frolcová, and Morten Mørup. Characterizing polarization in social networks using the signed relational latent distance model. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th Internationa...

  40. [40]

    A hierarchical block distance model for ultra low-dimensional graph representations, 2023

    Nikolaos Nakis, Abdulkadir Çelikkanat, Sune Lehmann Jørgensen, and Morten Mørup. A hierarchical block distance model for ultra low-dimensional graph representations, 2023. URL https://arxiv.org/abs/2204.05885

  41. [41]

    How low can you go? searching for the intrinsic dimensionality of complex networks using metric node embeddings

    Nikolaos Nakis, Niels Raunkjær Holm, Andreas Lyhne Fiehn, and Morten Mørup. How low can you go? searching for the intrinsic dimensionality of complex networks using metric node embeddings. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=V71ITh2w40. 12

  42. [42]

    Aitchison Embeddings for Learning Compositional Graph Representations

    Nikolaos Nakis, Chrysoula Kosma, Panagiotis Promponas, Michail Chatzianastasis, and Giannis Nikolentzos. Aitchison embeddings for learning compositional graph representations, 2026. URLhttps://arxiv.org/abs/2605.00716

  43. [43]

    Ng, Michael I

    A. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Neural Information Processing Systems, 2001. URL https://api.semanticscholar.org/ CorpusID:18764978

  44. [44]

    What do gnns actually learn? towards understanding their representations, 2024

    Giannis Nikolentzos, Michail Chatzianastasis, and Michalis Vazirgiannis. What do gnns actually learn? towards understanding their representations, 2024. URL https://arxiv.org/abs/ 2304.10851

  45. [45]

    Springer, 2006

    Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006

  46. [46]

    Why anchorage is not (that) important: Binary ties and sample selection

    Tore Opsahl. Why anchorage is not (that) important: Binary ties and sample selection. 2011. https://toreopsahl.com/datasets/

  47. [47]

    The power grid as a complex network: A survey

    Giuliano Andrea Pagani and Marco Aiello. The power grid as a complex network: A survey. Physica A: Statistical Mechanics and its Applications, 392(11):2688–2700, 2013. ISSN 0378-

  48. [48]

    URL https://www.sciencedirect

    doi: https://doi.org/10.1016/j.physa.2013.01.023. URL https://www.sciencedirect. com/science/article/pii/S0378437113000575

  49. [49]

    Changing climates of conflict: A social network experiment in 56 schools.Proceedings of the National Academy of Sciences, 113: 201514483, 01 2016

    Elizabeth Paluck, Hana Shepherd, and Peter Aronow. Changing climates of conflict: A social network experiment in 56 schools.Proceedings of the National Academy of Sciences, 113: 201514483, 01 2016. doi: 10.1073/pnas.1514483113

  50. [50]

    Bayesian estimation of the latent dimension and communities in stochastic blockmodels.Statistics and Computing, 30(5):1291–1307, 2020

    Francesco Sanna Passino and Nicholas A Heard. Bayesian estimation of the latent dimension and communities in stochastic blockmodels.Statistics and Computing, 30(5):1291–1307, 2020

  51. [51]

    DeepWalk: Online Learning of Social Representations

    Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. DeepWalk: Online Learning of Social Representations. InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710, 2014

  52. [52]

    Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec

    Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec. InProceedings of the 11th ACM International Conference on Web Search and Data Mining, pages 459–467, 2018

  53. [53]

    Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM review, 52(3):471–501, 2010

    Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM review, 52(3):471–501, 2010

  54. [54]

    Rossi and Nesreen Ahmed

    Ryan A. Rossi and Nesreen Ahmed. The network data repository with interactive graph analytics and visualization. InAAAI Conference on Artificial Intelligence, 2015. URL https: //api.semanticscholar.org/CorpusID:10141934

  55. [55]

    The effective rank: A measure of effective dimensionality

    Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. In 2007 15th European signal processing conference, pages 606–610. IEEE, 2007

  56. [56]

    Patrick Rubin-Delanchy, Joshua Cape, Minh Tang, and Carey E Priebe. A statistical interpreta- tion of spectral embedding: the generalised random dot product graph.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(4):1446–1473, 2022

  57. [57]

    Amartya Sanyal, Philip H. S. Torr, and Puneet K. Dokania. Stable rank normalization for improved generalization in neural networks and gans, 2020. URL https://arxiv.org/abs/ 1906.04659

  58. [58]

    Maximum-margin matrix factorization

    Nathan Srebro, Jason Rennie, and Tommi Jaakkola. Maximum-margin matrix factorization. Advances in neural information processing systems, 17, 2004

  59. [59]

    Biogrid: a general repository for interaction datasets.Nucleic Acids Research, 34, 01 2006

    Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. Biogrid: a general repository for interaction datasets.Nucleic Acids Research, 34, 01 2006. doi: 10.1093/nar/gkj109. URLhttps://doi.org/10.1093/nar/gkj109

  60. [60]

    Consistent latent position estimation and vertex classification for random dot product graphs.IEEE transactions on pattern analysis and machine intelligence, 36(1):48–57, 2013

    Daniel L Sussman, Minh Tang, and Carey E Priebe. Consistent latent position estimation and vertex classification for random dot product graphs.IEEE transactions on pattern analysis and machine intelligence, 36(1):48–57, 2013. 13

  61. [61]

    On the effect of misspecifying the embedding dimension in low-rank network models.arXiv preprint arXiv:2601.06014, 2026

    Roddy Taing and Keith Levin. On the effect of misspecifying the embedding dimension in low-rank network models.arXiv preprint arXiv:2601.06014, 2026

  62. [62]

    Traud, Peter J

    Amanda L. Traud, Peter J. Mucha, and Mason A. Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165–4180, August 2012. ISSN 0378-4371. doi: 10.1016/j.physa.2011.12.021. URL http://dx.doi.org/10.1016/j. physa.2011.12.021

  63. [63]

    Tu.An Introduction to Manifolds

    Loring W. Tu.An Introduction to Manifolds. Springer, 2 edition, 2011

  64. [64]

    A tutorial on spectral clustering, 2007

    Ulrike von Luxburg. A tutorial on spectral clustering, 2007. URL https://arxiv.org/abs/ 0711.0189

  65. [65]

    Community preserving network embedding

    Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. Community preserving network embedding. InProceedings of the 31st AAAI Conference on Artificial Intelligence, 2017

  66. [66]

    Watts and Steven H

    Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks.Nature, 393:440–442, 1998. URLhttps://api.semanticscholar.org/CorpusID:3034643

  67. [67]

    Diff-erank: A novel rank-based metric for evaluating large language models.Advances in Neural Information Processing Systems, 37:39501–39521, 2024

    Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, and Weiran Huang. Diff-erank: A novel rank-based metric for evaluating large language models.Advances in Neural Information Processing Systems, 37:39501–39521, 2024

  68. [68]

    Effective rank and the staircase phenomenon: New insights into neural network training dynamics.arXiv e-prints, pages arXiv–2412, 2024

    Jiang Yang, Yuxiang Zhao, and Quanhui Zhu. Effective rank and the staircase phenomenon: New insights into neural network training dynamics.arXiv e-prints, pages arXiv–2412, 2024

  69. [69]

    Automatic dimensionality selection from the scree plot via the use of profile likelihood.Computational Statistics & Data Analysis, 51(2):918–930, 2006

    Mu Zhu and Ali Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood.Computational Statistics & Data Analysis, 51(2):918–930, 2006. 14 A Extra Preliminaries We give some basic definitions used in the proofs. We only consider smooth submanifolds of the Euclidean space. The definitions in this appendix are standard...