Recognition: unknown
The Network Structure of Mathlib
Pith reviewed 2026-05-08 05:00 UTC · model grok-4.3
The pith
Mathlib's dependency network shows formalization compresses semantic hierarchies, with centrality reflecting language infrastructure over mathematical relevance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By extracting the dependency structure of Mathlib into a graph of 308,129 declarations, 8.4 million edges, and 7,563 modules and then decomposing the edges to isolate explicit ones, the analysis finds a 50.9 percent coupling between human-designed namespaces and logical structures, a median usage of only 1.6 percent of the imported scope, and that network centrality primarily identifies language infrastructure rather than mathematical relevance.
What carries the argument
A multilayer dependency graph of declarations and modules, decomposed to isolate explicit edges from compiler-synthesized and proof-driven ones.
If this is right
- Human-designed taxonomies in formal libraries align only partially with the actual logical dependency structure.
- Developers import substantially more modules than they use in practice.
- Standard centrality measures applied to the library highlight foundational language components over advanced mathematical content.
- Formalization produces a compressed representation in which many semantic distinctions collapse into shared infrastructure.
Where Pith is reading between the lines
- Library design tools could use the measured import-utilization gap to suggest minimal import sets that still cover actual needs.
- Navigation or search features in formal libraries might benefit from separating infrastructural layers from domain-specific ones.
- Repeating the same decomposition and centrality analysis on other large formal libraries would test whether hierarchy compression is a general effect of formalization.
- Proof assistants could prioritize caching or indexing the high-centrality infrastructural nodes to improve performance.
Load-bearing premise
The custom graph decompositions successfully isolate explicit edges from those synthesized by the compiler or driven by proofs without introducing systematic bias in the measured properties.
What would settle it
Recomputing the 50.9 percent coupling figure and the centrality rankings on the same declarations after changing the decomposition rules to treat a different set of compiler-generated edges as explicit, then checking whether the reported divergence and infrastructure dominance persist.
Figures
read the original abstract
The ongoing development of Lean 4's Mathlib has produced a macroscopic structural complexity that interweaves logical, mathematical, and infrastructural dependencies. We present a network analysis of this library, extracting its dependency structure into a multilayer graph of 308,129 declarations, 8.4 million edges, and 7,563 modules. By introducing graph decompositions that isolate explicit edges from those synthesized by the compiler or driven by proofs, we quantify the structural properties of formalized mathematics. Our analysis reveals three findings. First, taxonomies designed by humans diverge from logical structures, exhibiting a 50.9% coupling across namespaces. Second, developers utilize a median of 1.6% of the imported scope. Third, formalization compresses semantic hierarchies, with network centrality capturing language infrastructure rather than mathematical relevance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript performs a network analysis of Lean 4's Mathlib by extracting a multilayer dependency graph with 308,129 declarations, 8.4 million edges, and 7,563 modules. It introduces custom graph decompositions to separate explicit edges from compiler-synthesized and proof-driven ones, then reports three quantitative results: 50.9% coupling between human-designed taxonomies and logical structures across namespaces, a median developer usage of 1.6% of imported scope, and the observation that network centrality primarily reflects language infrastructure rather than mathematical relevance.
Significance. If the decomposition and measurements are reliable, the work supplies a macroscopic empirical view of dependency structure in a large-scale formal library, which could inform library maintenance, import hygiene, and the design of future formalization efforts. The multilayer decomposition approach itself is a methodological contribution at the intersection of network science and interactive theorem proving, provided it receives external validation.
major comments (3)
- [Methods / Graph Construction] The three quantitative claims rest on the graph extraction and three-way edge decomposition (explicit / compiler-synthesized / proof-driven) described in the methods. No validation against ground-truth manual classification on even a small set of modules is supplied, nor are the precise heuristics for distinguishing compiler-generated edges (e.g., typeclass search, macro expansion, proof-term generation) stated. This directly affects the reliability of the 50.9% coupling, 1.6% median usage, and centrality conclusions.
- [Results / Centrality Analysis] The claim that 'network centrality captures language infrastructure rather than mathematical relevance' is load-bearing for the third finding. It is supported only by post-decomposition centrality rankings; without an independent check that the decomposition has not systematically reassigned proof-scaffolding edges to the infrastructure layer, the interpretation remains at risk of circularity with the chosen edge categories.
- [Results / Taxonomy vs. Logical Structure] The 50.9% coupling figure between taxonomies and logical structures is presented without an explicit definition of the coupling metric, the namespace-to-logical-structure mapping procedure, or any sensitivity analysis to alternative namespace partitions. This makes it impossible to assess whether the reported divergence is robust or an artifact of the chosen aggregation.
minor comments (3)
- [Abstract] The abstract states the three findings but supplies no information on graph-construction methodology or validation; a one-sentence methods clause would improve readability.
- [Figures] Several figures (e.g., those showing degree distributions and centrality rankings) lack axis labels, legends, or captions that explicitly tie the plotted quantities to the decomposed edge types.
- [Related Work] The manuscript cites prior network analyses of mathematical corpora but does not compare its decomposition technique or quantitative results against those earlier studies.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive major comments. We address each point below and commit to revisions that strengthen the methodological transparency and robustness of the reported results.
read point-by-point responses
-
Referee: [Methods / Graph Construction] The three quantitative claims rest on the graph extraction and three-way edge decomposition (explicit / compiler-synthesized / proof-driven) described in the methods. No validation against ground-truth manual classification on even a small set of modules is supplied, nor are the precise heuristics for distinguishing compiler-generated edges (e.g., typeclass search, macro expansion, proof-term generation) stated. This directly affects the reliability of the 50.9% coupling, 1.6% median usage, and centrality conclusions.
Authors: We agree that greater transparency is needed. The decomposition rules are deterministic and based on metadata present in Lean 4's .olean export format (specifically, the 'isCompilerGenerated' flag, the origin of typeclass instances via the elaborator, and proof-term reconstruction markers). We will revise the Methods section to state these rules explicitly, include pseudocode, and provide one concrete example per category drawn from Mathlib modules. In addition, we will perform and report a small manual audit of 200 randomly sampled edges across five modules to supply limited empirical support for classification accuracy. While a full ground-truth corpus remains outside the scope of the present study, these additions directly address the reliability concern for the three quantitative claims. revision: yes
-
Referee: [Results / Centrality Analysis] The claim that 'network centrality captures language infrastructure rather than mathematical relevance' is load-bearing for the third finding. It is supported only by post-decomposition centrality rankings; without an independent check that the decomposition has not systematically reassigned proof-scaffolding edges to the infrastructure layer, the interpretation remains at risk of circularity with the chosen edge categories.
Authors: We acknowledge the risk of circularity. Edge labels are assigned from static compiler metadata before any centrality computation occurs, using criteria orthogonal to degree, betweenness, or eigenvector scores. To demonstrate independence, the revised manuscript will (i) restate this ordering explicitly and (ii) recompute the top-20 central nodes on the undecomposed graph, showing that the infrastructure dominance persists. This additional check removes the circularity concern while preserving the original interpretation. revision: yes
-
Referee: [Results / Taxonomy vs. Logical Structure] The 50.9% coupling figure between taxonomies and logical structures is presented without an explicit definition of the coupling metric, the namespace-to-logical-structure mapping procedure, or any sensitivity analysis to alternative namespace partitions. This makes it impossible to assess whether the reported divergence is robust or an artifact of the chosen aggregation.
Authors: We apologize for the missing details. The coupling metric is the average Jaccard index between the set of logical dependencies internal to each human namespace and the set of dependencies crossing namespace boundaries, after mapping each declaration to its declaring namespace. The mapping is a simple string prefix match on the fully qualified name. We will insert a new subsection that (a) gives the exact formula, (b) describes the aggregation steps, and (c) reports a sensitivity analysis under two alternative partitions (top-level namespaces only, and namespaces truncated at depth 2). These additions will allow readers to evaluate the robustness of the 50.9% figure. revision: yes
Circularity Check
No circularity: direct extraction and empirical computation from library source
full rationale
The paper extracts the dependency graph directly from Mathlib source (308k declarations, 8.4M edges), applies methodological decompositions to isolate explicit vs. synthesized edges, and computes descriptive statistics such as namespace coupling (50.9%) and median import usage (1.6%). The three findings are observational results on this constructed graph; centrality rankings are compared to human taxonomies without any fitted parameters, predictions of held-out data, or self-referential definitions. No step reduces by construction to its inputs, and no self-citation chain supports a load-bearing claim. The analysis is self-contained against the external library artifact.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The multilayer graph extraction correctly separates explicit edges from those synthesized by the compiler or driven by proofs.
Reference graph
Works this paper leans on
- [1]
-
[2]
[Baa22] A. Baanen. Use and abuse of instance parameters in the Lean mathematical library. InInteractive Theorem Proving (ITP 2022), volume 237 ofLIPIcs, pages 4:1–4:20,
2022
- [3]
- [4]
-
[5]
[BGLL08] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008,
2008
-
[6]
[BHMN15] J. C. Blanchette, M. Haslbeck, D. Matichuk, and T. Nipkow. Mining the archive of formal proofs. InIntelligent Computer Mathematics (CICM 2015), volume 9150 ofLNCS, pages 3–17. Springer,
2015
-
[7]
Ongoing; funded by EPSRC grant EP/Y022904/1. [BTK+25] M. Bemberek, M. Todošić, B. Koloski, B. Roberček, and A. Lavrič. Combining textual and structural information for premise selection in Lean.arXiv:2510.23637,
- [8]
-
[9]
[CT23] J. Commelin and A. Topaz. Abstraction boundaries and spec driven development in pure mathematics. arXiv:2309.02345,
-
[10]
Danon, A
[DDGDA05] L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification.Journal of Statistical Mechanics: Theory and Experiment, 2005(09):P09008,
2005
-
[11]
[HSS08] A. A. Hagberg, D. A. Schult, and P. J. Swart. Exploring network structure, dynamics, and function using NetworkX. InProceedings of the 7th Python in Science Conference (SciPy 2008), pages 11–15,
2008
-
[12]
Huch and M
[HW24] F. Huch and M. Wenzel. Distributed parallel build for the Isabelle archive of formal proofs. InInteractive Theorem Proving (ITP 2024), volume 309 ofLIPIcs, pages 22:1–22:19,
2024
-
[13]
Klein, T
[KBDK12] G. Klein, T. Bourke, J. Daum, and L. Kolanski. Challenges and experiences in managing large-scale proofs. In Intelligent Computer Mathematics (CICM 2012), volume 7362 ofLNCS, pages 32–48. Springer,
2012
-
[14]
[LW04] N. LaBelle and E. Wallingford. Inter-package dependency networks in open-source software. arXiv:cs/0411096,
work page internal anchor Pith review arXiv
-
[15]
McKinney
[McK10] W. McKinney. Data structures for statistical computing in Python. InProceedings of the 9th Python in Science Conference (SciPy 2010), pages 56–61,
2010
-
[16]
The Lean mathematical library
[The20] The Mathlib Community. The Lean mathematical library. InProceedings of the 9th ACM SIGPLAN Interna- tional Conference on Certified Programs and Proofs (CPP 2020), pages 367–381. ACM,
2020
-
[17]
[UM25] E. Uskuplu and L. S. Moss. KnowTeX: Visualizing mathematical dependencies.arXiv:2601.15294,
-
[18]
van Doorn, G
[vDEL20] F. van Doorn, G. Ebner, and R. Y. Lewis. Maintaining a library of formal mathematics. InIntelligent Computer Mathematics (CICM 2020), volume 12236 ofLNCS, pages 251–267. Springer,
2020
-
[19]
LeanArchitect: Automating blueprint generation for humans and AI.arXiv preprint arXiv:2601.22554,
[ZMAW25] Thomas Zhu, Pietro Monticone, Jeremy Avigad, and Sean Welleck. LeanArchitect: Automating blueprint generation for humans and AI.arXiv preprint arXiv:2601.22554,
-
[20]
16 Definition A.2.6(Normalized Mutual Information (NMI) [DDGDA05, §3, Eq
It quantifies how much knowing one partition reduces uncertainty about the other. 16 Definition A.2.6(Normalized Mutual Information (NMI) [DDGDA05, §3, Eq. (2)]).Given two partitions AandBof a setV, thenormalized mutual informationis NMI(A,B) = 2I(A;B) H(A) +H(B) , whereIandHare defined above. NMI ranges from0(independent) to1(identical partitions). Defin...
2025
-
[21]
I depend on others
indicates a stronger coupling. All names have theMathlib.prefix removed. •Structural stability: community persistenceNMI(L (t),L(t+1)), hub turnover (rank correlation of in- degree rankings between consecutive snapshots), and redundancy trendr(G(t) module)over time. •Structural events: discontinuities in the time series coinciding with known events (e.g.,...
2023
-
[22]
In both cases, a lognormal distribution provides a significantly better fit than a pure power law (likelihood ratioR=−10.6andR=−289.7, respectively;p<0.01)
for out- degree. In both cases, a lognormal distribution provides a significantly better fit than a pure power law (likelihood ratioR=−10.6andR=−289.7, respectively;p<0.01). This mirrors the pattern observed for Gmodule (§C.1.2): heavy-tailed distributions that are not cleanly power-law. 54 Table 28: Degree statistics forG(2) ns (unweighted). In-degree Ou...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.