arxiv: 2605.11142 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

Eftychia Makri, Katerina Mamali, Konstantinos Tsirkas, Leandros Tassiulas, Nicholas A. Christakis, Nikolaos Nakis, Panagiotis Promponas

Pith reviewed 2026-05-13 06:19 UTC · model grok-4.3

classification 💻 cs.LG

keywords graph representation learninglatent embeddingsspectral occupancyeffective rankmodel capacitylink predictionpositive semidefinite kernel

0 comments

The pith

The spectrum of a learned positive semidefinite kernel measures the effective capacity of latent graph models instead of their nominal rank.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In graph representation learning the latent dimension is fixed before training as a hyperparameter, yet the learned embeddings are identifiable only up to rotation and rescaling so the nominal rank rarely matches the quantity that actually governs behavior. The paper replaces that rank with the trace-normalized spectrum of the positive semidefinite kernel induced by the embeddings; the Shannon effective rank of those normalized eigenvalues then serves as a comparable summary of realized capacity across fits. This effective rank becomes a controllable training coordinate: a scalar parameter shapes it during optimization and bisection targets any desired value up to the rank cap, with supporting proofs of local regularity and monotonicity. The approach yields visible performance-capacity frontiers on collaboration, social, biological, and infrastructure networks while remaining competitive with strong link-prediction baselines and supplying aligned lower-capacity spectral prefixes of the same fitted model. A reader would care because capacity shifts from a brittle preset to an observable and adjustable property of the trained model.

Core claim

By extracting the spectrum of the learned positive semidefinite kernel, normalizing its eigenvalues by the trace, and summarizing them with Shannon effective rank, the method treats realized capacity as a property of the fitted model rather than a hyperparameter of training; a single scalar during optimization together with bisection search targets any desired effective dimension within the rank cap, and the resulting spectral prefixes supply aligned lower-capacity views of the same embeddings.

What carries the argument

The trace-normalized eigenvalue spectrum of the positive semidefinite kernel induced by the latent embeddings, with its Shannon effective rank serving as both capacity summary and training-time coordinate.

If this is right

Performance-capacity frontiers become traceable for collaboration, social, biological, and infrastructure networks.
Spectral prefixes supply aligned lower-capacity representations of any fitted model.
A scalar parameter plus bisection search controls realized dimension at training time.
Competitive link-prediction accuracy is maintained in the overparameterized regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same spectral-occupancy view could be applied to non-graph embedding models to expose capacity without changing the architecture.
Hyperparameter search over latent dimension could be replaced by post-training adjustment of the target effective rank.
Monotonicity of the realized-dimension profile may allow more efficient search procedures in related representation-learning settings.

Load-bearing premise

The spectrum of the learned positive semidefinite kernel after trace normalization accurately captures the quantity that governs model behavior and remains comparable across different fits.

What would settle it

An experiment that trains two models to the same trace-normalized spectrum (hence same effective rank) but with different nominal ranks and checks whether their link-prediction performance is statistically identical on held-out edges.

Figures

Figures reproduced from arXiv: 2605.11142 by Eftychia Makri, Katerina Mamali, Konstantinos Tsirkas, Leandros Tassiulas, Nicholas A. Christakis, Nikolaos Nakis, Panagiotis Promponas.

**Figure 1.** Figure 1: Overview of SPECTRA. A rank-capped factor L induces a trace-normalized PSD kernel K(L) used in the edge model. The normalized spectrum of K(L) defines spectral occupancy. eigenvalues, and yields Eckart–Young–Mirsky-optimal low-rank PSD summaries [14]. Shannon effective rank has appeared as a signal-processing spectral-entropy measure [54] and as the Vendi score for diversity [16]; spectral entropy also dia… view at source ↗

**Figure 2.** Figure 2: Capacity frontiers across rank caps. Each dataset is swept over η ∈ [−0.25, 0.25] at r ∈ {64, 128, 256}; curves show mean±1σ across three seeds. Upper panels show achieved dspec versus η, lower panels show test AUC versus achieved dspec. Top: saturated datasets. Bottom: rank-cap-binding datasets. Legends report per-rank peak coordinates. 4 Experiments & Results We evaluate SPECTRA against representative la… view at source ↗

**Figure 3.** Figure 3: Adjacency matrices reordered by spectral assignment [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of Theorem 6. (a) Solution branches in Θe. Near a reference point θ ∗ (η0) satisfying the Assumptions 4 and 5, the optimum traces a smooth curve as η varies, with each θ ∗ (η) a strict local minimum of Fη on its slice. Fη may have additional local minima, each generating its own local branch under the same conditions. (b) Capacity profiles dspec(η) along the branches. The tracked branch is smo… view at source ↗

**Figure 5.** Figure 5: Structural events along the η path at two rank caps, on four representative datasets spanning the four structural families. Top row: r=128. Bottom row: r=256. Columns: citation (ca-grqc), infrastructure (inf-power), social (socfb-American75), biological (bio-grid-worm). Each panel shows dspec(η) vs η on log scale, mean ±1σ across three seeds. Vertical line: η=0. Event markers indicate probes where a struct… view at source ↗

read the original abstract

Graph representation learning has become a standard approach for analyzing networked data, with latent embeddings widely used for link prediction, community detection, and related tasks. Yet a basic design choice, the latent dimension, is still treated as a brittle hyperparameter, fixed before training and tuned by held-out performance. Learned factors are also identifiable only up to rotation and rescaling, so the nominal rank rarely coincides with the quantity that governs model behavior. We propose Spectral Prefix Extraction and Capacity-Targeted Representation Analysis (Spectra), which replaces rank as the unit of analysis with the spectrum of a learned positive semidefinite kernel, trace-normalized so that spectra are comparable across fits. The normalized eigenvalues form a distribution on the simplex, and their Shannon effective rank acts both as a summary of learned capacity and as a controllable training-time coordinate: a single scalar shapes this realized dimension during training, and bisection targets any desired value within the rank cap. To theoretically support that, we show local regularity and monotonicity of the realized-dimension profile. Across collaboration, social, biological, and infrastructure networks, Spectra traces performance--capacity frontiers that make the trade-off between predictive accuracy and realized dimension visible. It performs competitively with strong link-prediction baselines, yields aligned lower-capacity views of the same fitted model through spectral prefixes, and provides a principled handle on capacity in the overparameterized regime. Capacity thus becomes a property of the fitted model rather than a hyperparameter of the training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Spectra makes effective rank a controllable training coordinate via trace-normalized spectra and bisection, which is a clean new handle for graph embeddings, but the separation of capacity from scale still needs tighter verification.

read the letter

The core advance is treating Shannon effective rank of the trace-normalized kernel spectrum as something you can steer with a single scalar during training. They use bisection to hit a target rank and pull out spectral prefixes that give aligned lower-capacity views of the same fit. That directly tackles the mismatch between nominal dimension and what actually drives behavior in latent graph models for link prediction and community detection.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Spectra (Spectral Prefix Extraction and Capacity-Targeted Representation Analysis) for latent graph models. It replaces nominal rank with the spectrum of a learned positive semidefinite kernel after trace normalization, uses the Shannon effective rank of the resulting simplex distribution as a summary of realized capacity, and shows that a single scalar plus bisection can target any desired value within the rank cap. Local regularity and monotonicity of the realized-dimension profile are claimed as theoretical support. Experiments on collaboration, social, biological, and infrastructure networks demonstrate competitive link-prediction performance, visible performance-capacity frontiers, and aligned lower-capacity views via spectral prefixes.

Significance. If the normalization argument and the local regularity/monotonicity results hold, the work would make capacity a controllable property of the fitted model rather than a pre-training hyperparameter. This could clarify trade-offs in the overparameterized regime and supply a principled coordinate for analyzing graph representation learning across tasks.

major comments (2)

[Abstract and §4] Abstract and §4 (theoretical support): the central claim that trace normalization makes spectra comparable and that the resulting effective rank governs downstream behavior requires an argument that the discarded kernel scale does not couple to the link-prediction objective or generalization; no such argument or counter-example analysis is supplied, leaving the separation of capacity from hyperparameter unverified.
[§5] §5 (experiments): performance-capacity frontiers are reported without error bars, repeated runs, or explicit data-split details, so it is impossible to assess whether the claimed competitiveness and monotonic trade-offs are statistically reliable.

minor comments (2)

[§3] Notation for the scalar that controls realized dimension and for the bisection procedure should be introduced with an explicit equation in §3.
[§5] The manuscript would benefit from a short table summarizing the networks, their sizes, and the range of targeted effective ranks used in the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (theoretical support): the central claim that trace normalization makes spectra comparable and that the resulting effective rank governs downstream behavior requires an argument that the discarded kernel scale does not couple to the link-prediction objective or generalization; no such argument or counter-example analysis is supplied, leaving the separation of capacity from hyperparameter unverified.

Authors: We agree that the manuscript would be strengthened by an explicit argument showing that the kernel scale factor is decoupled from the link-prediction objective. In Spectra, trace normalization is performed after fitting so that the spectrum lies on the probability simplex; the link-prediction loss depends on the normalized inner products, which remain unchanged under uniform rescaling of the kernel. Nevertheless, to meet the referee's request we will insert a short subsection in the revised §4 that formalizes this invariance, supplies a brief counter-example sketch, and clarifies why the effective rank (rather than nominal rank or scale) is the relevant capacity coordinate. revision: yes
Referee: [§5] §5 (experiments): performance-capacity frontiers are reported without error bars, repeated runs, or explicit data-split details, so it is impossible to assess whether the claimed competitiveness and monotonic trade-offs are statistically reliable.

Authors: We accept that the current experimental presentation lacks the statistical detail needed for rigorous assessment. In the revised version we will (i) report mean performance with standard-error bars computed over at least five independent runs using distinct random seeds, (ii) state the precise train/validation/test splits (including any temporal or random partitioning) for every dataset, and (iii) confirm that the observed monotonicity of the performance-capacity curves holds across these repetitions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained.

full rationale

The paper defines Spectra via trace normalization of the learned kernel spectrum to obtain a simplex distribution whose Shannon effective rank serves as the capacity coordinate. This effective rank is computed post-fit from the eigenvalues, while the scalar control and bisection targeting operate as an independent external mechanism to achieve desired values. The claimed local regularity and monotonicity of the realized-dimension profile are presented as theoretical properties derived from the optimization setup rather than tautological re-statements of the fitted values themselves. No load-bearing step reduces a prediction, uniqueness claim, or ansatz to a self-citation chain or to the input data by construction; the separation of capacity from hyperparameter rests on the external scalar and empirical performance-capacity frontiers, keeping the chain independent of its own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions about positive semidefinite kernels and identifiability up to rotation, plus one controllable scalar whose value is chosen to target a desired effective rank.

free parameters (1)

scalar controlling realized dimension
Single scalar used during training to shape the effective dimension, targeted by bisection.

axioms (2)

domain assumption Learned factors are identifiable only up to rotation and rescaling
Invoked to explain why nominal rank fails to match governing capacity.
standard math The kernel matrix is positive semidefinite
Required for the eigenvalue spectrum to be well-defined and non-negative.

pith-pipeline@v0.9.0 · 5592 in / 1254 out tokens · 37592 ms · 2026-05-13T06:19:07.769795+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
Theorem 6 (Local smoothness and monotonicity of the effective spectral dimension)... d/dη log dspec(η) ≥ 0

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 2 internal anchors

[1]

Learning role-based graph embeddings.arXiv preprint arXiv:1802.02896, 2018

Nesreen K Ahmed, Ryan Rossi, John Boaz Lee, Theodore L Willke, Rong Zhou, Xiangnan Kong, and Hoda Eldardiry. Learning role-based graph embeddings.arXiv preprint arXiv:1802.02896, 2018

work page arXiv 2018
[2]

Induction of social contagion for diverse outcomes in structured experiments in isolated villages.Science (New York, N.Y.), 384:eadi5147, 05 2024

Edoardo Airoldi and Nicholas Christakis. Induction of social contagion for diverse outcomes in structured experiments in isolated villages.Science (New York, N.Y.), 384:eadi5147, 05 2024. doi: 10.1126/science.adi5147

work page doi:10.1126/science.adi5147 2024
[3]

Mixed membership stochastic blockmodels, 2007

Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership stochastic blockmodels, 2007. URLhttps://arxiv.org/abs/0705.4485

work page arXiv 2007
[4]

Christakis

Marcus Alexander, Laura Forastiere, Swati Gupta, and Nicholas A. Christakis. Algorithms for seeding social networks can enhance the adoption of a public health intervention in urban india. Proceedings of the National Academy of Sciences, 119(30):e2120742119, 2022. doi: 10.1073/ pnas.2120742119. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2120742119

work page doi:10.1073/pnas.2120742119 2022
[5]

Implicit regularization in deep matrix factorization.Advances in neural information processing systems, 32, 2019

Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in neural information processing systems, 32, 2019

work page 2019
[6]

Statistical inference on random dot product graphs: a survey.Journal of Machine Learning Research, 18 (226):1–92, 2018

Avanti Athreya, Donniell E Fishkind, Minh Tang, Carey E Priebe, Youngser Park, Joshua T V ogelstein, Keith Levin, Vince Lyzinski, Yichen Qin, and Daniel L Sussman. Statistical inference on random dot product graphs: a survey.Journal of Machine Learning Research, 18 (226):1–92, 2018. 10

work page 2018
[7]

Bartlett, Philip M

Peter L. Bartlett, Philip M. Long, Gábor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression.Proceedings of the National Academy of Sciences, 117(48):30063–30070, April 2020. ISSN 1091-6490. doi: 10.1073/pnas.1907378117. URL http://dx.doi.org/ 10.1073/pnas.1907378117

work page doi:10.1073/pnas.1907378117 2020
[8]

Can network theory-based targeting increase technology adoption?American Economic Review, 111:1918–1943, 06 2021

Lori Beaman, Ariel Benyishay, and Jeremy Magruder. Can network theory-based targeting increase technology adoption?American Economic Review, 111:1918–1943, 06 2021. doi: 10.1257/aer.20200295

work page doi:10.1257/aer.20200295 1918
[9]

Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019

Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019. ISSN 1091-6490. doi: 10.1073/pnas.1903070116. URLhttp://dx.doi.org/10.1073/pnas.1903070116

work page doi:10.1073/pnas.1903070116 2019
[10]

Exact matrix completion via convex optimization

Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012

work page 2012
[11]

Grarep: Learning graph representations with global structural information

Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. InProceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 891–900, 2015

work page 2015
[12]

Network cross-validation for determining the number of communities in network data.Journal of the American Statistical Association, 113(521):241–251, 2018

Kehui Chen and Jing Lei. Network cross-validation for determining the number of communities in network data.Journal of the American Statistical Association, 113(521):241–251, 2018

work page 2018
[13]

Low-rank matrix approximation for neural network com- pression

Kalyan Cherukuri and Aarav Lala. Low-rank matrix approximation for neural network com- pression. In2025 10th International Conference on Machine Learning Technologies (ICMLT), pages 166–170. IEEE, 2025

work page 2025
[14]

The approximation of one matrix by another of lower rank

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1:211–218, 1936. URL https://api.semanticscholar.org/CorpusID: 10163399

work page 1936
[15]

Gradient-based spectral embeddings of random dot product graphs.IEEE Transactions on Signal and Information Processing over Networks, 10:1–16, 2023

Marcelo Fiori, Bernardo Marenco, Federico Larroca, Paola Bermolen, and Gonzalo Mateos. Gradient-based spectral embeddings of random dot product graphs.IEEE Transactions on Signal and Information Processing over Networks, 10:1–16, 2023

work page 2023
[16]

Preprint, arXiv:2210.02410

Dan Friedman and Adji Bousso Dieng. The vendi score: A diversity evaluation metric for machine learning.arXiv preprint arXiv:2210.02410, 2022

work page arXiv 2022
[17]

node2vec: Scalable Feature Learning for Networks

Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016

work page 2016
[18]

Principled approach to the selection of the embedding dimension of networks.Nature Communications, 12(1):3772, 2021

Weiwei Gu, Aditya Tandon, Yong-Yeol Ahn, and Filippo Radicchi. Principled approach to the selection of the embedding dimension of networks.Nature Communications, 12(1):3772, 2021

work page 2021
[19]

Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017

Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Sre- bro. Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017

work page 2017
[20]

Cohort profile: The national longitudinal study of adolescent to adult health (add health).International journal of epidemiology, 48, 06 2019

Kathleen Harris, Carolyn Halpern, Eric Whitsel, Jon Hussey, Ley Killeya-Jones, Joyce Tabor, and Sarah Dean. Cohort profile: The national longitudinal study of adolescent to adult health (add health).International journal of epidemiology, 48, 06 2019. doi: 10.1093/ije/dyz115

work page doi:10.1093/ije/dyz115 2019
[21]

Modeling homophily and stochastic equivalence in symmetric relational data

Peter Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. Advances in neural information processing systems, 20, 2007

work page 2007
[22]

Latent space approaches to social network analysis.Journal of the american Statistical association, 97(460):1090–1098, 2002

Peter D Hoff, Adrian E Raftery, and Mark S Handcock. Latent space approaches to social network analysis.Journal of the american Statistical association, 97(460):1090–1098, 2002

work page 2002
[23]

Network signflip parallel analysis for selecting the embedding dimension.arXiv preprint arXiv:2509.05722, 2025

David Hong and Joshua Cape. Network signflip parallel analysis for selecting the embedding dimension.arXiv preprint arXiv:2509.05722, 2025

work page arXiv 2025
[24]

Stable rank and intrinsic dimension of real and complex matrices.SIAM Journal on Matrix Analysis and Applications, 46(3):1988–2007, 2025

Ilse CF Ipsen and Arvind K Saibaba. Stable rank and intrinsic dimension of real and complex matrices.SIAM Journal on Matrix Analysis and Applications, 46(3):1988–2007, 2025. 11

work page 1988
[25]

Us adolescents’ friendship networks and health risk behaviors: a systematic review of studies using social network analysis and add health data

Kwon Chan Jeon and Patricia Goodson. Us adolescents’ friendship networks and health risk behaviors: a systematic review of studies using social network analysis and add health data. PeerJ, 3:e1052, 06 2015. doi: 10.7717/peerj.1052

work page doi:10.7717/peerj.1052 2015
[26]

Jeong, S

H. Jeong, S. P. Mason, A.-L. Barabási, and Z. N. Oltvai. Lethality and centrality in protein networks.Nature, 411(6833):41–42, May 2001. ISSN 1476-4687. doi: 10.1038/35075138. URLhttp://dx.doi.org/10.1038/35075138

work page doi:10.1038/35075138 2001
[27]

Nandan Kumar Jha and Brandon Reagen. Spectral scaling laws in language models: emphhow effectively do feed-forward networks use their latent space? InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35047–35058, 2025

work page 2025
[28]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[29]

Determinantal point processes for machine learning.Foundations and Trends® in Machine Learning, 5(2-3):123–286, December 2012

Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning.Foundations and Trends® in Machine Learning, 5(2-3):123–286, December 2012. ISSN 1935-8245. doi: 10.1561/2200000044. URLhttp://dx.doi.org/10.1561/2200000044

work page doi:10.1561/2200000044 2012
[30]

Matryoshka representation learning, 2024

Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, and Ali Farhadi. Matryoshka representation learning, 2024. URL https://arxiv.org/abs/2205. 13147

work page 2024
[31]

Lee.Introduction to Smooth Manifolds

John M. Lee.Introduction to Smooth Manifolds. Springer, 2 edition, 2013

work page 2013
[32]

SNAP Datasets: Stanford large network dataset collection, June 2014

Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014

work page 2014
[33]

Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1): 2–es, 2007

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1): 2–es, 2007

work page 2007
[34]

Measuring the intrinsic dimension of objective landscapes,

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes, 2018. URLhttps://arxiv.org/abs/1804.08838

work page arXiv 2018
[35]

Network cross-validation by edge sampling.Biometrika, 107(2):257–276, 2020

Tianxi Li, Elizaveta Levina, and Ji Zhu. Network cross-validation by edge sampling.Biometrika, 107(2):257–276, 2020

work page 2020
[36]

The link prediction problem for social networks

David Liben-Nowell and Jon Kleinberg. The link prediction problem for social networks. In CIKM, page 556–559, 2003

work page 2003
[37]

Graph entropy guided node embedding dimension selection for graph neural networks

Gongxu Luo, Jianxin Li, Jianlin Su, Hao Peng, Carl Yang, Lichao Sun, Philip S Yu, and Lifang He. Graph entropy guided node embedding dimension selection for graph neural networks. arXiv preprint arXiv:2105.03178, 2021

work page arXiv 2021
[38]

Hm-ldm: A hybrid-membership latent distance model, 2022

Nikolaos Nakis, Abdulkadir Çelikkanat, and Morten Mørup. Hm-ldm: A hybrid-membership latent distance model, 2022. URLhttps://arxiv.org/abs/2206.03463

work page arXiv 2022
[39]

Characterizing polarization in social networks using the signed relational latent distance model

Nikolaos Nakis, Abdulkadir Celikkanat, Louis Boucherie, Christian Djurhuus, Felix Burmester, Daniel Mathias Holmelund, Monika Frolcová, and Morten Mørup. Characterizing polarization in social networks using the signed relational latent distance model. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th Internationa...

work page 2023
[40]

A hierarchical block distance model for ultra low-dimensional graph representations, 2023

Nikolaos Nakis, Abdulkadir Çelikkanat, Sune Lehmann Jørgensen, and Morten Mørup. A hierarchical block distance model for ultra low-dimensional graph representations, 2023. URL https://arxiv.org/abs/2204.05885

work page arXiv 2023
[41]

How low can you go? searching for the intrinsic dimensionality of complex networks using metric node embeddings

Nikolaos Nakis, Niels Raunkjær Holm, Andreas Lyhne Fiehn, and Morten Mørup. How low can you go? searching for the intrinsic dimensionality of complex networks using metric node embeddings. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=V71ITh2w40. 12

work page 2025
[42]

Aitchison Embeddings for Learning Compositional Graph Representations

Nikolaos Nakis, Chrysoula Kosma, Panagiotis Promponas, Michail Chatzianastasis, and Giannis Nikolentzos. Aitchison embeddings for learning compositional graph representations, 2026. URLhttps://arxiv.org/abs/2605.00716

work page internal anchor Pith review Pith/arXiv arXiv 2026
[43]

Ng, Michael I

A. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Neural Information Processing Systems, 2001. URL https://api.semanticscholar.org/ CorpusID:18764978

work page 2001
[44]

What do gnns actually learn? towards understanding their representations, 2024

Giannis Nikolentzos, Michail Chatzianastasis, and Michalis Vazirgiannis. What do gnns actually learn? towards understanding their representations, 2024. URL https://arxiv.org/abs/ 2304.10851

work page arXiv 2024
[45]

Springer, 2006

Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006

work page 2006
[46]

Why anchorage is not (that) important: Binary ties and sample selection

Tore Opsahl. Why anchorage is not (that) important: Binary ties and sample selection. 2011. https://toreopsahl.com/datasets/

work page 2011
[47]

The power grid as a complex network: A survey

Giuliano Andrea Pagani and Marco Aiello. The power grid as a complex network: A survey. Physica A: Statistical Mechanics and its Applications, 392(11):2688–2700, 2013. ISSN 0378-

work page 2013
[48]

URL https://www.sciencedirect

doi: https://doi.org/10.1016/j.physa.2013.01.023. URL https://www.sciencedirect. com/science/article/pii/S0378437113000575

work page doi:10.1016/j.physa.2013.01.023 2013
[49]

Changing climates of conflict: A social network experiment in 56 schools.Proceedings of the National Academy of Sciences, 113: 201514483, 01 2016

Elizabeth Paluck, Hana Shepherd, and Peter Aronow. Changing climates of conflict: A social network experiment in 56 schools.Proceedings of the National Academy of Sciences, 113: 201514483, 01 2016. doi: 10.1073/pnas.1514483113

work page doi:10.1073/pnas.1514483113 2016
[50]

Bayesian estimation of the latent dimension and communities in stochastic blockmodels.Statistics and Computing, 30(5):1291–1307, 2020

Francesco Sanna Passino and Nicholas A Heard. Bayesian estimation of the latent dimension and communities in stochastic blockmodels.Statistics and Computing, 30(5):1291–1307, 2020

work page 2020
[51]

DeepWalk: Online Learning of Social Representations

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. DeepWalk: Online Learning of Social Representations. InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710, 2014

work page 2014
[52]

Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec

Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec. InProceedings of the 11th ACM International Conference on Web Search and Data Mining, pages 459–467, 2018

work page 2018
[53]

Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM review, 52(3):471–501, 2010

Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM review, 52(3):471–501, 2010

work page 2010
[54]

Rossi and Nesreen Ahmed

Ryan A. Rossi and Nesreen Ahmed. The network data repository with interactive graph analytics and visualization. InAAAI Conference on Artificial Intelligence, 2015. URL https: //api.semanticscholar.org/CorpusID:10141934

work page 2015
[55]

The effective rank: A measure of effective dimensionality

Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. In 2007 15th European signal processing conference, pages 606–610. IEEE, 2007

work page 2007
[56]

Patrick Rubin-Delanchy, Joshua Cape, Minh Tang, and Carey E Priebe. A statistical interpreta- tion of spectral embedding: the generalised random dot product graph.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(4):1446–1473, 2022

work page 2022
[57]

Amartya Sanyal, Philip H. S. Torr, and Puneet K. Dokania. Stable rank normalization for improved generalization in neural networks and gans, 2020. URL https://arxiv.org/abs/ 1906.04659

work page arXiv 2020
[58]

Maximum-margin matrix factorization

Nathan Srebro, Jason Rennie, and Tommi Jaakkola. Maximum-margin matrix factorization. Advances in neural information processing systems, 17, 2004

work page 2004
[59]

Biogrid: a general repository for interaction datasets.Nucleic Acids Research, 34, 01 2006

Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. Biogrid: a general repository for interaction datasets.Nucleic Acids Research, 34, 01 2006. doi: 10.1093/nar/gkj109. URLhttps://doi.org/10.1093/nar/gkj109

work page doi:10.1093/nar/gkj109 2006
[60]

Consistent latent position estimation and vertex classification for random dot product graphs.IEEE transactions on pattern analysis and machine intelligence, 36(1):48–57, 2013

Daniel L Sussman, Minh Tang, and Carey E Priebe. Consistent latent position estimation and vertex classification for random dot product graphs.IEEE transactions on pattern analysis and machine intelligence, 36(1):48–57, 2013. 13

work page 2013
[61]

On the effect of misspecifying the embedding dimension in low-rank network models.arXiv preprint arXiv:2601.06014, 2026

Roddy Taing and Keith Levin. On the effect of misspecifying the embedding dimension in low-rank network models.arXiv preprint arXiv:2601.06014, 2026

work page arXiv 2026
[62]

Traud, Peter J

Amanda L. Traud, Peter J. Mucha, and Mason A. Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165–4180, August 2012. ISSN 0378-4371. doi: 10.1016/j.physa.2011.12.021. URL http://dx.doi.org/10.1016/j. physa.2011.12.021

work page doi:10.1016/j.physa.2011.12.021 2012
[63]

Tu.An Introduction to Manifolds

Loring W. Tu.An Introduction to Manifolds. Springer, 2 edition, 2011

work page 2011
[64]

A tutorial on spectral clustering, 2007

Ulrike von Luxburg. A tutorial on spectral clustering, 2007. URL https://arxiv.org/abs/ 0711.0189

work page arXiv 2007
[65]

Community preserving network embedding

Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. Community preserving network embedding. InProceedings of the 31st AAAI Conference on Artificial Intelligence, 2017

work page 2017
[66]

Watts and Steven H

Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks.Nature, 393:440–442, 1998. URLhttps://api.semanticscholar.org/CorpusID:3034643

work page 1998
[67]

Diff-erank: A novel rank-based metric for evaluating large language models.Advances in Neural Information Processing Systems, 37:39501–39521, 2024

Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, and Weiran Huang. Diff-erank: A novel rank-based metric for evaluating large language models.Advances in Neural Information Processing Systems, 37:39501–39521, 2024

work page 2024
[68]

Effective rank and the staircase phenomenon: New insights into neural network training dynamics.arXiv e-prints, pages arXiv–2412, 2024

Jiang Yang, Yuxiang Zhao, and Quanhui Zhu. Effective rank and the staircase phenomenon: New insights into neural network training dynamics.arXiv e-prints, pages arXiv–2412, 2024

work page 2024
[69]

Automatic dimensionality selection from the scree plot via the use of profile likelihood.Computational Statistics & Data Analysis, 51(2):918–930, 2006

Mu Zhu and Ali Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood.Computational Statistics & Data Analysis, 51(2):918–930, 2006. 14 A Extra Preliminaries We give some basic definitions used in the proofs. We only consider smooth submanifolds of the Euclidean space. The definitions in this appendix are standard...

work page 2006