pith. machine review for the scientific record. sign in

arxiv: 2604.24797 · v2 · submitted 2026-04-26 · 💻 cs.LO · cs.PL· cs.SI· math.HO

Recognition: unknown

The Network Structure of Mathlib

Nanyun Peng, Patrick Shafto, Simone Severini, Xinze Li

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:00 UTC · model grok-4.3

classification 💻 cs.LO cs.PLcs.SImath.HO
keywords mathlibleandependency graphformal mathematicsnetwork analysislibrary structuresemantic hierarchy
0
0 comments X

The pith

Mathlib's dependency network shows formalization compresses semantic hierarchies, with centrality reflecting language infrastructure over mathematical relevance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a multilayer graph from Mathlib containing over 300,000 declarations and millions of edges, then applies decompositions to separate explicit user-written dependencies from those added by the compiler or required only for proofs. It measures three structural features: the alignment between human-created namespaces and actual logical connections, the fraction of imported modules that code actually uses, and which nodes rank highest by standard network centrality scores. These measurements indicate that human taxonomies match the logical structure only about half the time, that typical developments draw on a very small slice of what they import, and that the most central parts of the library are foundational language elements rather than core mathematical ideas. The results matter for anyone building or navigating large formal libraries because they show how formalization changes the way mathematical content is connected and prioritized.

Core claim

By extracting the dependency structure of Mathlib into a graph of 308,129 declarations, 8.4 million edges, and 7,563 modules and then decomposing the edges to isolate explicit ones, the analysis finds a 50.9 percent coupling between human-designed namespaces and logical structures, a median usage of only 1.6 percent of the imported scope, and that network centrality primarily identifies language infrastructure rather than mathematical relevance.

What carries the argument

A multilayer dependency graph of declarations and modules, decomposed to isolate explicit edges from compiler-synthesized and proof-driven ones.

If this is right

  • Human-designed taxonomies in formal libraries align only partially with the actual logical dependency structure.
  • Developers import substantially more modules than they use in practice.
  • Standard centrality measures applied to the library highlight foundational language components over advanced mathematical content.
  • Formalization produces a compressed representation in which many semantic distinctions collapse into shared infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Library design tools could use the measured import-utilization gap to suggest minimal import sets that still cover actual needs.
  • Navigation or search features in formal libraries might benefit from separating infrastructural layers from domain-specific ones.
  • Repeating the same decomposition and centrality analysis on other large formal libraries would test whether hierarchy compression is a general effect of formalization.
  • Proof assistants could prioritize caching or indexing the high-centrality infrastructural nodes to improve performance.

Load-bearing premise

The custom graph decompositions successfully isolate explicit edges from those synthesized by the compiler or driven by proofs without introducing systematic bias in the measured properties.

What would settle it

Recomputing the 50.9 percent coupling figure and the centrality rankings on the same declarations after changing the decomposition rules to treat a different set of compiler-generated edges as explicit, then checking whether the reported divergence and infrastructure dominance persist.

Figures

Figures reproduced from arXiv: 2604.24797 by Nanyun Peng, Patrick Shafto, Simone Severini, Xinze Li.

Figure 1
Figure 1. Figure 1: Three views of the same declarations. (a) The file-system hierarchy view at source ↗
Figure 2
Figure 2. Figure 2: Blueprint (left) versus compiler-extracted declaration graph (right) for basic list lemmas. The view at source ↗
Figure 3
Figure 3. Figure 3: A fragment of the Mathlib source tree (commit 534cf0b). Left: the directory layout. Right: the corresponding rooted tree T. Each directory path maps to a module with dots replacing separators, e.g., Mathlib/Algebra/Group/Defs.lean ↔ Mathlib.Algebra.Group.Defs. B.1.2 The Module Graph Gmodule Each Lean source file begins with import statements that declare its dependencies. Lean 4 supports both import and pu… view at source ↗
Figure 4
Figure 4. Figure 4: The simplest import edge: Data.Nat.Notation imports Init. The arrow points from the import￾ing module to the imported module. All names have the Mathlib. prefix removed. Since November 2025, Lean 4’s module system [Lea25] distinguishes public import from import (pri￾vate), allowing modules to selectively limit transitive visibility. This distinction refines the module graph: Definition B.1.6 (Visibility gr… view at source ↗
Figure 5
Figure 5. Figure 5: Visibility graph (Definition B.1.6). (a) public import re-exports the imported module’s decla￾rations; plain import keeps them private. (b) Left: when B publicly imports C, A’s transitive visibility includes {B, C}. Right: when B privately imports C, A can see only {B}, because the private edge breaks the transitive chain, removing C from G pub module. 20 view at source ↗
Figure 6
Figure 6. Figure 6: A fragment of Gmodule: Algebra.Group.Defs imports two modules that both depend on Init, forming a diamond. All names have the Mathlib. prefix removed. B.1.3 Transitive Reduction and Redundant Imports Since Gmodule is a DAG, it has a unique transitive reduction [AGU72]: the smallest subgraph G − module ⊆ Gmodule with the same reachability relation. An edge (m1, m2) ∈ Emodule is transitively redundant if m2 … view at source ↗
Figure 7
Figure 7. Figure 7: Illustrating transitive redundancy on the diamond from Figure view at source ↗
Figure 8
Figure 8. Figure 8: Build graph (Definition B.1.9). (a) Import declarations with per-module compilation times w(m). (b) The weighted DAG: the critical path γ ∗ (orange boxes and edges) yields W(γ ∗ ) = 1+6+1 = 8 s; Data. Nat.Defs and Data.Int.Defs compile in parallel but cannot reduce the total below W(γ ∗ ). All names have the Mathlib. prefix removed. In the current snapshot, the critical path of Gbuild traverses 161 modules… view at source ↗
Figure 9
Figure 9. Figure 9: Premise edges for the theorem Nat.lt_irrefl: the mathematical premise not_succ_le_self (a theorem) and two infrastructure premises LT.lt (an abbreviation providing < notation) and instLTNat (a typeclass instance). The edges are directed: none of the three premises cites lt_irrefl in return. The eight kinds arrange into a clear hierarchy of edge production and consumption ( view at source ↗
Figure 10
Figure 10. Figure 10: A fragment of Gthm: the theorems add_left_comm and add_right_comm both cite the premises add_comm, add_assoc, and Eq.refl. The shared node Eq.refl (in-degree 69,580) exemplifies the hub phenomenon. All names have the Nat. prefix removed except Eq.refl. The Nat.lt_irrefl example above showed a single theorem’s premises; we now turn to a pair of theorems to illustrate how shared premises create the hub stru… view at source ↗
Figure 11
Figure 11. Figure 11: Statement vs. proof decomposition (Definition view at source ↗
Figure 12
Figure 12. Figure 12: Gext vs. Gtc (Definitions B.2.3 and B.2.6), from Mathlib.Algebra.Group.Defs (lines 173–759). (a) Source: each extends creates a field inheritance edge. (b) Gext: edges from extends only. (c) Gtc: the same nodes, but Lean auto-generates transitive forgetful instances (blue), e.g. CommMonoid directly provides Semigroup without going through Monoid. Thus Gext ⊂ Gtc. At our snapshot, σ = 74.2%: 6,257,832 edge… view at source ↗
Figure 13
Figure 13. Figure 13: Coercion graph (Definition B.2.5). (a) Writing (n : R) where n : N triggers the elaborator to chain three coercions N → Z → Q → R, each a registered instance in K. The individual coercions are defined in Init.Data.Int.Basic, Mathlib.Data.Rat.Cast.Defs, and Mathlib.Topology.Algebra.Order.Field. (b) The subgraph of Gcoe for the number tower. Long chains (length ≥ 3) degrade elaboration performance and can p… view at source ↗
Figure 14
Figure 14. Figure 14: Synthesized vs. explicit edge partition (Definition view at source ↗
Figure 15
Figure 15. Figure 15: Definitional height (Definition B.2.8). (a) From Init.Data.Nat.Basic: Nat.pow calls Nat.mul, which calls Nat.add. (b) The unfolding chain: to type-check an expression involving Nat.pow, the kernel may need to unfold through Nat.mul and Nat.add, yielding δ = 3. Changes to any definition in the chain can break type-checking even if the explicit premises of Nat.pow are unchanged. Of the 101,021 definitions w… view at source ↗
Figure 16
Figure 16. Figure 16: Tactic usage profile (Definition B.2.9). (a) From Mathlib.Algebra.Group.Defs (line 1056): mul_div_assoc is proved by a single rw invocation with three lemma arguments. The tactic profile τ (d) captures which tactics are used. (b) The declaration node annotated with its tactic tag; the bar chart shows library-wide tactic frequencies over 79,910 tactic-mode proofs (extracted via jixia [fre24]). Definition B… view at source ↗
Figure 17
Figure 17. Figure 17: Auto-derived instance edges (Definition B.2.10). (a) From Mathlib.Tactic.Linter. FlexibleLinter (line 190): the deriving clause on Stained instructs Lean to auto-generate four type￾class instances. (b) In Gthm, solid edges are human-written (constructors → type); dashed edges belong to Eauto: the auto-generated instances reference both the type and its constructors, but no human ever wrote these dependenc… view at source ↗
Figure 18
Figure 18. Figure 18: Declaration metadata as node attributes (Definition view at source ↗
Figure 19
Figure 19. Figure 19: Import utilization (Definition B.2.11). (a) Mathlib.Algebra.Order.ZeroLEOne (line 43) imports Mathlib.Algebra.Notation.Pi.Defs, which defines ∼50 declarations (pointwise instances for One, Mul, Inv, Div, etc. and their to_additive mirrors). Typeclass synthesis resolves only Pi.instOne and Pi. instZero (solid); the remaining ∼48 declarations (gray) are unused. (b) The utilization is 2/50 = 0.04, below the … view at source ↗
Figure 20
Figure 20. Figure 20: Aggregation from Gthm to G (k) ns . (a) Declarations in Gthm; green = namespace boundaries, blue = module boundaries. (b) Depth-2 namespaces. (c) Depth 1. All names have the Nat. prefix removed. Cycles in the namespace graph. Unlike Gmodule and Gthm, which are directed acyclic graphs (the compiler and the type-checker respectively forbid circular dependencies), the namespace graph G (k) ns can contain dir… view at source ↗
Figure 21
Figure 21. Figure 21: How aggregation creates cycles. (a) At the declaration level, two edges cross the view at source ↗
Figure 22
Figure 22. Figure 22: Co-modification graph (Definition B.4.1). (a) Three merged PRs and the files each touches. (b) The resulting weighted undirected graph Gco: solid edges exist in both Gco and Gmodule (structural + logical coupling); the dashed edge between Algebra.Group.Defs and Data.Nat.Order appears only in Gco, revealing a hidden dependency invisible to the import graph. The thick edge (w = 2) indicates a stronger coupl… view at source ↗
Figure 23
Figure 23. Figure 23: Temporal graph sequence (Definition B.4.2). (a) Monthly snapshots of Mathlib commits, with two structural events: the Lean 3→4 port (2023) and the module system introduction (November 2025). (b) The declaration graph G (t) thm at three time points, growing from ∼50k to 318k declarations. Tracking structural indicators across the sequence reveals whether properties like the 17.5% redundancy rate are stable… view at source ↗
Figure 24
Figure 24. Figure 24: In-degree (blue) and out-degree (orange) distributions on log–log axes for view at source ↗
Figure 25
Figure 25. Figure 25: DAG width by topological layer for Gmodule (top) and G − module (bottom). The distributions share a characteristic funnel shape: a sharp peak at level 0 (over 1,400 leaf modules), a rapid decay through the first 20 layers, a long tail extending to level 153, and a secondary peak near level 126. The near-identical profiles confirm that the funnel topology arises from logical necessity, not from redundant i… view at source ↗
Figure 26
Figure 26. Figure 26: Pairwise centrality scatter plots for Gmodule. Each point is one module. The wide scatter across all three panels confirms that in-degree, PageRank, and betweenness capture distinct structural roles. The three-way divergence visible in view at source ↗
Figure 27
Figure 27. Figure 27: Robustness curves for Gmodule: fraction of nodes in the largest WCC as a function of the fraction removed, under random removal (blue) and targeted removal by PageRank (coral) view at source ↗
Figure 28
Figure 28. Figure 28: Degree distributions of Gthm on log–log axes. Left: in-degree, with power law fit (gold line, α = 1.78, xmin = 20). Right: out-degree. The in-degree tail extends over four orders of magnitude; the out-degree distribution is sharply bounded. Unlike Gmodule, where transitive reduction removes 17.5% of edges (§B.1.3), Gthm does not admit a meaningful transitive reduction: every edge is extracted mechanically… view at source ↗
Figure 29
Figure 29. Figure 29: DAG width by topological layer for Gthm (84 layers). The source layer (119,633 zero-citation declarations) dominates; the remaining layers decay rapidly, reaching single digits by layer 60 view at source ↗
Figure 30
Figure 30. Figure 30: visualizes the pairwise relationships between the three centrality measures. 10 0 10 1 10 2 10 3 10 4 10 5 In-degree 10 6 10 5 10 4 10 3 10 2 PageRank (a) In-degree vs. PageRank. 10 0 10 1 10 2 10 3 10 4 10 5 In-degree 10 10 10 9 10 8 10 7 10 6 10 5 10 4 Betweenness (b) In-degree vs. Betweenness. 10 10 10 9 10 8 10 7 10 6 10 5 10 4 Betweenness 10 6 10 5 10 4 10 3 10 2 PageRank (c) Betweenness vs. PageRank view at source ↗
Figure 31
Figure 31. Figure 31: Robustness curves for Gthm: fraction of nodes in the largest WCC as a function of the fraction removed, under random removal (indigo) and targeted removal by PageRank (gold). Under random removal, the giant component shrinks almost linearly: removing 20% of nodes leaves 80% connected. Under targeted attack (removing the highest-PageRank nodes first), 20% removal reduces connectivity to 46%, a 34-percentag… view at source ↗
Figure 32
Figure 32. Figure 32: Containment decay curve. Green circles connected by the solid line show containment at names view at source ↗
Figure 33
Figure 33. Figure 33: Degree distributions of G (2) ns on log–log axes. Left: in-degree (blue), with power-law reference (gold dashed, α = 1.60, xmin = 4). Right: out-degree (coral), with power-law reference (α = 1.90, xmin = 13). Both tails are heavy but better fit by a lognormal than a pure power law. The extreme outlier in the in￾degree panel is _root_ (deg− = 9,846; see Remark C.3.2). The namespace _root_ dominates both in… view at source ↗
Figure 34
Figure 34. Figure 34: Condensed DAG of G (2) ns by topological layer (4,080 super-nodes, 8 layers). Each bar counts the number of super-nodes (SCCs) at that layer. Layer 0 holds 3,941 leaf SCCs; layer 5 holds a single giant SCC comprising 5,899 namespaces. The shallow, bimodal structure reflects pervasive mutual dependency: most namespaces either sit at the periphery or belong to one densely interconnected core. The 8-layer co… view at source ↗
Figure 35
Figure 35. Figure 35: Pairwise centrality scatter plots for G (2) ns . Each point is one namespace. The extreme outlier in (a) is _root_ (Remark C.3.2). The scatter confirms that PageRank and betweenness capture distinct structural roles at the namespace level. C.3.5 Community Structure Louvain community detection (§A.2.2) on the undirected, weighted projection of G (2) ns yields 12 communities with modularity Q = 0.271 view at source ↗
Figure 36
Figure 36. Figure 36: Robustness curves for G (2) ns : fraction of nodes in the largest WCC as a function of the fraction removed, under random removal (blue) and targeted removal by PageRank (coral). The pattern matches the classic hub-dominated vulnerability signature observed at both the module level (§C.1.6) and the declaration level (§C.2.7): high resilience under random failure, extreme fragility under targeted attack ( view at source ↗
Figure 37
Figure 37. Figure 37: Three views of the same declarations (cf. Figure view at source ↗
Figure 38
Figure 38. Figure 38: The same five declarations and six dependency edges, grouped by module (a) vs. namespace (b). view at source ↗
Figure 39
Figure 39. Figure 39: The granularity spectrum. Blue boxes are modules; red dots are declarations; solid red arrows are intra-module dependencies; dashed red arrows are cross-module dependencies. (a) One declaration per module: Gmodule ∼= Gthm (308K modules, unnavigable). (b) Current Mathlib: multiple cross-module declaration edges collapse into a single blue import arrow; internal edges are invisible at the module level. (c) … view at source ↗
Figure 40
Figure 40. Figure 40: The namespace granularity spectrum. Green boxes are namespaces; red dots are declarations; solid red arrows are intra-namespace dependencies; dashed red arrows cross namespace boundaries. (a) One declaration per namespace: every edge crosses a boundary (containment = 0%). (b) Current Mathlib: nested namespaces with partial containment. (c) All declarations in one namespace: every edge is internal (contain… view at source ↗
read the original abstract

The ongoing development of Lean 4's Mathlib has produced a macroscopic structural complexity that interweaves logical, mathematical, and infrastructural dependencies. We present a network analysis of this library, extracting its dependency structure into a multilayer graph of 308,129 declarations, 8.4 million edges, and 7,563 modules. By introducing graph decompositions that isolate explicit edges from those synthesized by the compiler or driven by proofs, we quantify the structural properties of formalized mathematics. Our analysis reveals three findings. First, taxonomies designed by humans diverge from logical structures, exhibiting a 50.9% coupling across namespaces. Second, developers utilize a median of 1.6% of the imported scope. Third, formalization compresses semantic hierarchies, with network centrality capturing language infrastructure rather than mathematical relevance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript performs a network analysis of Lean 4's Mathlib by extracting a multilayer dependency graph with 308,129 declarations, 8.4 million edges, and 7,563 modules. It introduces custom graph decompositions to separate explicit edges from compiler-synthesized and proof-driven ones, then reports three quantitative results: 50.9% coupling between human-designed taxonomies and logical structures across namespaces, a median developer usage of 1.6% of imported scope, and the observation that network centrality primarily reflects language infrastructure rather than mathematical relevance.

Significance. If the decomposition and measurements are reliable, the work supplies a macroscopic empirical view of dependency structure in a large-scale formal library, which could inform library maintenance, import hygiene, and the design of future formalization efforts. The multilayer decomposition approach itself is a methodological contribution at the intersection of network science and interactive theorem proving, provided it receives external validation.

major comments (3)
  1. [Methods / Graph Construction] The three quantitative claims rest on the graph extraction and three-way edge decomposition (explicit / compiler-synthesized / proof-driven) described in the methods. No validation against ground-truth manual classification on even a small set of modules is supplied, nor are the precise heuristics for distinguishing compiler-generated edges (e.g., typeclass search, macro expansion, proof-term generation) stated. This directly affects the reliability of the 50.9% coupling, 1.6% median usage, and centrality conclusions.
  2. [Results / Centrality Analysis] The claim that 'network centrality captures language infrastructure rather than mathematical relevance' is load-bearing for the third finding. It is supported only by post-decomposition centrality rankings; without an independent check that the decomposition has not systematically reassigned proof-scaffolding edges to the infrastructure layer, the interpretation remains at risk of circularity with the chosen edge categories.
  3. [Results / Taxonomy vs. Logical Structure] The 50.9% coupling figure between taxonomies and logical structures is presented without an explicit definition of the coupling metric, the namespace-to-logical-structure mapping procedure, or any sensitivity analysis to alternative namespace partitions. This makes it impossible to assess whether the reported divergence is robust or an artifact of the chosen aggregation.
minor comments (3)
  1. [Abstract] The abstract states the three findings but supplies no information on graph-construction methodology or validation; a one-sentence methods clause would improve readability.
  2. [Figures] Several figures (e.g., those showing degree distributions and centrality rankings) lack axis labels, legends, or captions that explicitly tie the plotted quantities to the decomposed edge types.
  3. [Related Work] The manuscript cites prior network analyses of mathematical corpora but does not compare its decomposition technique or quantitative results against those earlier studies.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and constructive major comments. We address each point below and commit to revisions that strengthen the methodological transparency and robustness of the reported results.

read point-by-point responses
  1. Referee: [Methods / Graph Construction] The three quantitative claims rest on the graph extraction and three-way edge decomposition (explicit / compiler-synthesized / proof-driven) described in the methods. No validation against ground-truth manual classification on even a small set of modules is supplied, nor are the precise heuristics for distinguishing compiler-generated edges (e.g., typeclass search, macro expansion, proof-term generation) stated. This directly affects the reliability of the 50.9% coupling, 1.6% median usage, and centrality conclusions.

    Authors: We agree that greater transparency is needed. The decomposition rules are deterministic and based on metadata present in Lean 4's .olean export format (specifically, the 'isCompilerGenerated' flag, the origin of typeclass instances via the elaborator, and proof-term reconstruction markers). We will revise the Methods section to state these rules explicitly, include pseudocode, and provide one concrete example per category drawn from Mathlib modules. In addition, we will perform and report a small manual audit of 200 randomly sampled edges across five modules to supply limited empirical support for classification accuracy. While a full ground-truth corpus remains outside the scope of the present study, these additions directly address the reliability concern for the three quantitative claims. revision: yes

  2. Referee: [Results / Centrality Analysis] The claim that 'network centrality captures language infrastructure rather than mathematical relevance' is load-bearing for the third finding. It is supported only by post-decomposition centrality rankings; without an independent check that the decomposition has not systematically reassigned proof-scaffolding edges to the infrastructure layer, the interpretation remains at risk of circularity with the chosen edge categories.

    Authors: We acknowledge the risk of circularity. Edge labels are assigned from static compiler metadata before any centrality computation occurs, using criteria orthogonal to degree, betweenness, or eigenvector scores. To demonstrate independence, the revised manuscript will (i) restate this ordering explicitly and (ii) recompute the top-20 central nodes on the undecomposed graph, showing that the infrastructure dominance persists. This additional check removes the circularity concern while preserving the original interpretation. revision: yes

  3. Referee: [Results / Taxonomy vs. Logical Structure] The 50.9% coupling figure between taxonomies and logical structures is presented without an explicit definition of the coupling metric, the namespace-to-logical-structure mapping procedure, or any sensitivity analysis to alternative namespace partitions. This makes it impossible to assess whether the reported divergence is robust or an artifact of the chosen aggregation.

    Authors: We apologize for the missing details. The coupling metric is the average Jaccard index between the set of logical dependencies internal to each human namespace and the set of dependencies crossing namespace boundaries, after mapping each declaration to its declaring namespace. The mapping is a simple string prefix match on the fully qualified name. We will insert a new subsection that (a) gives the exact formula, (b) describes the aggregation steps, and (c) reports a sensitivity analysis under two alternative partitions (top-level namespaces only, and namespaces truncated at depth 2). These additions will allow readers to evaluate the robustness of the 50.9% figure. revision: yes

Circularity Check

0 steps flagged

No circularity: direct extraction and empirical computation from library source

full rationale

The paper extracts the dependency graph directly from Mathlib source (308k declarations, 8.4M edges), applies methodological decompositions to isolate explicit vs. synthesized edges, and computes descriptive statistics such as namespace coupling (50.9%) and median import usage (1.6%). The three findings are observational results on this constructed graph; centrality rankings are compared to human taxonomies without any fitted parameters, predictions of held-out data, or self-referential definitions. No step reduces by construction to its inputs, and no self-citation chain supports a load-bearing claim. The analysis is self-contained against the external library artifact.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the extracted multilayer graph faithfully represents logical dependencies after decomposition into explicit, compiler-synthesized, and proof-driven edges; no free parameters or new entities are introduced beyond standard network metrics.

axioms (1)
  • domain assumption The multilayer graph extraction correctly separates explicit edges from those synthesized by the compiler or driven by proofs.
    This decomposition is required to isolate the three edge types mentioned in the abstract and underpins all reported structural properties.

pith-pipeline@v0.9.0 · 5442 in / 1235 out tokens · 40925 ms · 2026-05-08T05:00:53.533810+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Achim, A

    [ABB+25] T. Achim, A. Best, A. Bietti, K. Der, M. Fédérico, S. Gukov, D. Halpern-Leistner, K. Henningsgard, Y. Kudryashov, A. Meiburg, et al. Aristotle: IMO-level automated theorem proving.arXiv:2510.01346,

  2. [2]

    [Baa22] A. Baanen. Use and abuse of instance parameters in the Lean mathematical library. InInteractive Theorem Proving (ITP 2022), volume 237 ofLIPIcs, pages 4:1–4:20,

  3. [3]

    Baanen, M

    [BBC+25] A. Baanen, M. R. Ballard, J. Commelin, B. Gin ge Chen, M. Rothgang, and D. Testa. Growing Mathlib: Maintenance of a large scale mathematical library. arXiv:2508.21593,

  4. [4]

    [BDS13] V. C. Barbosa, R. Donangelo, and S. R. Souza. The network structure of mathematical knowledge according to the Wikipedia, MathWorld, and DLMF online libraries.arXiv:1212.3536,

  5. [5]

    [BGLL08] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008,

  6. [6]

    [BHMN15] J. C. Blanchette, M. Haslbeck, D. Matichuk, and T. Nipkow. Mining the archive of formal proofs. InIntelligent Computer Mathematics (CICM 2015), volume 9150 ofLNCS, pages 3–17. Springer,

  7. [7]

    Combining textual and struc- tural information for premise selection in lean.arXiv preprint arXiv:2510.23637,

    Ongoing; funded by EPSRC grant EP/Y022904/1. [BTK+25] M. Bemberek, M. Todošić, B. Koloski, B. Roberček, and A. Lavrič. Combining textual and structural information for premise selection in Lean.arXiv:2510.23637,

  8. [8]

    [CGH+25] L. Chen, J. Gu, L. Huang, et al. Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726,

  9. [9]

    Commelin and A

    [CT23] J. Commelin and A. Topaz. Abstraction boundaries and spec driven development in pure mathematics. arXiv:2309.02345,

  10. [10]

    Danon, A

    [DDGDA05] L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification.Journal of Statistical Mechanics: Theory and Experiment, 2005(09):P09008,

  11. [11]

    [HSS08] A. A. Hagberg, D. A. Schult, and P. J. Swart. Exploring network structure, dynamics, and function using NetworkX. InProceedings of the 7th Python in Science Conference (SciPy 2008), pages 11–15,

  12. [12]

    Huch and M

    [HW24] F. Huch and M. Wenzel. Distributed parallel build for the Isabelle archive of formal proofs. InInteractive Theorem Proving (ITP 2024), volume 309 ofLIPIcs, pages 22:1–22:19,

  13. [13]

    Klein, T

    [KBDK12] G. Klein, T. Bourke, J. Daum, and L. Kolanski. Challenges and experiences in managing large-scale proofs. In Intelligent Computer Mathematics (CICM 2012), volume 7362 ofLNCS, pages 32–48. Springer,

  14. [14]

    LaBelle and E

    [LW04] N. LaBelle and E. Wallingford. Inter-package dependency networks in open-source software. arXiv:cs/0411096,

  15. [15]

    McKinney

    [McK10] W. McKinney. Data structures for statistical computing in Python. InProceedings of the 9th Python in Science Conference (SciPy 2010), pages 56–61,

  16. [16]

    The Lean mathematical library

    [The20] The Mathlib Community. The Lean mathematical library. InProceedings of the 9th ACM SIGPLAN Interna- tional Conference on Certified Programs and Proofs (CPP 2020), pages 367–381. ACM,

  17. [17]

    Uskuplu and L

    [UM25] E. Uskuplu and L. S. Moss. KnowTeX: Visualizing mathematical dependencies.arXiv:2601.15294,

  18. [18]

    van Doorn, G

    [vDEL20] F. van Doorn, G. Ebner, and R. Y. Lewis. Maintaining a library of formal mathematics. InIntelligent Computer Mathematics (CICM 2020), volume 12236 ofLNCS, pages 251–267. Springer,

  19. [19]

    LeanArchitect: Automating blueprint generation for humans and AI.arXiv preprint arXiv:2601.22554,

    [ZMAW25] Thomas Zhu, Pietro Monticone, Jeremy Avigad, and Sean Welleck. LeanArchitect: Automating blueprint generation for humans and AI.arXiv preprint arXiv:2601.22554,

  20. [20]

    16 Definition A.2.6(Normalized Mutual Information (NMI) [DDGDA05, §3, Eq

    It quantifies how much knowing one partition reduces uncertainty about the other. 16 Definition A.2.6(Normalized Mutual Information (NMI) [DDGDA05, §3, Eq. (2)]).Given two partitions AandBof a setV, thenormalized mutual informationis NMI(A,B) = 2I(A;B) H(A) +H(B) , whereIandHare defined above. NMI ranges from0(independent) to1(identical partitions). Defin...

  21. [21]

    I depend on others

    indicates a stronger coupling. All names have theMathlib.prefix removed. •Structural stability: community persistenceNMI(L (t),L(t+1)), hub turnover (rank correlation of in- degree rankings between consecutive snapshots), and redundancy trendr(G(t) module)over time. •Structural events: discontinuities in the time series coinciding with known events (e.g.,...

  22. [22]

    In both cases, a lognormal distribution provides a significantly better fit than a pure power law (likelihood ratioR=−10.6andR=−289.7, respectively;p<0.01)

    for out- degree. In both cases, a lognormal distribution provides a significantly better fit than a pure power law (likelihood ratioR=−10.6andR=−289.7, respectively;p<0.01). This mirrors the pattern observed for Gmodule (§C.1.2): heavy-tailed distributions that are not cleanly power-law. 54 Table 28: Degree statistics forG(2) ns (unweighted). In-degree Ou...