pith. sign in

arxiv: 2606.07563 · v1 · pith:SKJCMYNCnew · submitted 2026-05-25 · 💻 cs.LG · cs.AI

Emergence via Phase Transitions: Mechanism Landscapes and Universal Convergence Across Complex Systems

Pith reviewed 2026-06-29 22:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords emergencephase transitionmechanism landscapegrokkingconvergencefixed pointHierarchical Emergence Framework
0
0 comments X

The pith

A phase transition at a critical energy threshold drives convergence to unique fixed points independent of initial conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the Hierarchical Emergence Framework to explain why independent systems across machine learning, biology, and physics develop similar high-level structures. Emergence is modeled as a phase transition in a mechanism landscape, with a critical energy threshold Ec marking the shift from competing mechanisms to dominance by one minimum-cost mechanism. Under structural assumptions the framework proves physical feasibility, strict metric contraction, and convergence to a single fixed-point representation regardless of starting state. Experiments on grokking in modular arithmetic transformers identify reproducible signatures of the transition, including weight norm peaks before the accuracy jump and normalized curves collapsing to a tanh form, with all models reaching nearly identical final accuracy.

Core claim

The framework models emergence as a phase transition in a mechanism landscape constrained by thermodynamic and information-theoretic laws. A critical energy threshold Ec separates an exploration regime with competing mechanisms from a convergence regime governed by a unique minimum-cost mechanism. Under structural assumptions, this yields physical feasibility, strict metric contraction, and convergence toward a unique fixed-point representation independent of initial conditions. The structure connects to causal emergence through Effective Information and mechanism competition entropy. In 111 grokking experiments the weight norm peaks before the transition in 92 percent of runs, accuracy curv

What carries the argument

The critical energy threshold Ec in the Hierarchical Emergence Framework, which separates a regime of competing mechanisms from convergence to a unique minimum-cost mechanism.

If this is right

  • Grokking in transformers exhibits a reproducible weight-norm peak before the accuracy transition in 92 percent of runs.
  • Normalized accuracy curves collapse onto a tanh kink consistent with a Landau-Ginzburg universality class.
  • Converged models reach identical performance levels regardless of initialization, weight decay, or training fraction.
  • The convergence structure links directly to causal emergence measured by Effective Information and mechanism competition entropy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same phase-transition structure could be tested for predictive power in biological evolution or physical renormalization flows.
  • If the metric contraction holds, it would imply that convergence speed depends mainly on distance to the critical energy threshold rather than microscopic details.
  • Extending the framework to non-transformer architectures might reveal whether the weight-norm signature generalizes beyond modular arithmetic tasks.

Load-bearing premise

The structural assumptions that permit proving physical feasibility and strict metric contraction toward a unique fixed point in the mechanism landscape.

What would settle it

Finding that multiple independent training runs or evolutionary simulations fail to converge to similar high-level structures or accuracy values after crossing the critical energy threshold, or that weight norms do not peak before the generalization jump in most cases.

Figures

Figures reproduced from arXiv: 2606.07563 by Truong Xuan Khanh.

Figure 1
Figure 1. Figure 1: Empirical evidence for HEF’s three-phase energy trajectory and universality class. (a) Weight-norm Ec fingerprint (Result E2). The normalised weight norm ∥w∥ 2/∥w0∥ 2 traces the three-phase HEF trajectory: rising during exploration (E > Ec), peaking near the phase boundary (dotted, median lead 1,050 steps before grokking), then falling during convergence (E < Ec). The peak precedes grokking in 92.1% of run… view at source ↗
Figure 2
Figure 2. Figure 2: Universal Feature Convergence confirms Corollary S8.2 (Result E1). All 89 grokked models converge to final test accuracy 0.9745 ± 0.014 (CV= 1.47%), independent of initial conditions. (a) Distribution across all 89 runs. (b) One-way ANOVA by prime p: F2,86 = 2.06, p = 0.134 — no significant effect. (c) By weight decay λ: F1,87 = 0.48, p = 0.490 — no significant effect. Convergence to the same R∞ regardless… view at source ↗
Figure 3
Figure 3. Figure 3: G2 scaling validation and λc regime transition (Result E4; Open Protocols 1a–c). (a) Grokking delay ∆t vs prime p (log–log), λ = 2.0, frac= 0.40. The original G2 prediction (∆t ∝ n/λ, slope +2) is falsified; observed slope β = −1.39±0.20 (R2 = 0.91) is consistent with the revised G2 ∆t ∝ 1/(frac · p ·λ) at the 10% level (p = 0.075). Error bars: 95% CI. (b) λ-dependence at p = 97. λ ∈ {1, 2} grok reliably; … view at source ↗
read the original abstract

Across machine learning, biology, and physics, independently evolving systems often converge toward strikingly similar high-level structures despite radically different microscopic details. Grokking circuits converge across random seeds, evolutionary lineages rediscover similar metabolic solutions, and renormalization flows approach common fixed points. We propose the Hierarchical Emergence Framework (HEF) as a candidate universality framework for such convergence phenomena. HEF models emergence as a phase transition in a mechanism landscape constrained by thermodynamic and information-theoretic laws. The framework introduces a critical energy threshold Ec separating an exploration regime with competing mechanisms from a convergence regime governed by a unique minimum-cost mechanism. Under structural assumptions, we prove physical feasibility, derive strict metric contraction, and establish convergence toward a unique fixed-point representation independent of initial conditions. We further connect this convergence structure to causal emergence through Effective Information and mechanism competition entropy. To test the framework, we study delayed generalization ("grokking") in modular arithmetic transformers across 111 experiments. We identify a reproducible empirical fingerprint of the Ec transition: the weight norm peaks systematically before grokking in 92% of runs. Normalized accuracy curves collapse onto a tanh kink (R^2=0.93) consistent with a Landau-Ginzburg universality class, and all grokked models converge to 0.9745+/-0.014 regardless of initialization, weight decay, or training fraction (ANOVA p>0.13). HEF is not presented as a universal theory of emergence, but as a falsifiable mathematical scaffold for studying convergence phenomena across complex systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The paper proposes the Hierarchical Emergence Framework (HEF) to model convergence phenomena across ML, biology, and physics as phase transitions in a mechanism landscape at a critical threshold Ec. It claims to prove physical feasibility, strict metric contraction, and convergence to a unique fixed-point representation independent of initial conditions under unspecified structural assumptions, links this to causal emergence via Effective Information, and reports empirical support from 111 grokking experiments in modular arithmetic transformers, including a 92% rate of weight-norm peaks before grokking, a tanh fit with R²=0.93, and convergence to accuracy 0.9745±0.014 independent of initialization (ANOVA p>0.13).

Significance. If the structural assumptions prove non-vacuous and the claimed contraction and fixed-point results can be rigorously derived, HEF could supply a falsifiable scaffold connecting phase-transition ideas to convergence across domains, with the reported empirical fingerprint offering testable predictions; the interdisciplinary link to Effective Information would add value if substantiated.

major comments (4)
  1. [Abstract] Abstract: The structural assumptions invoked to prove physical feasibility, strict metric contraction, and convergence to a unique fixed-point representation are never enumerated, defined, or justified, rendering it impossible to determine whether these results are non-trivial or follow from the framework.
  2. [Abstract] Abstract: No derivations, lemmas, equations, or proof sketches are supplied for the claimed results on feasibility, contraction, or unique fixed-point convergence, despite the explicit assertion that such proofs exist under the structural assumptions.
  3. [Abstract] Abstract / Empirical validation: The specific convergence accuracy 0.9745±0.014 and the tanh kink fit (R²=0.93) are obtained from the identical set of 111 experiments used to identify the Ec transition and the 92% weight-norm peak statistic, creating circularity that undermines the claim of independent validation.
  4. [Abstract] Abstract: The 'unique minimum-cost mechanism' is defined relative to a cost function whose explicit functional form is not stated independently of the observed convergence behavior, leaving the uniqueness claim dependent on the same data used to report the 0.9745 accuracy.
minor comments (2)
  1. [Abstract] The manuscript would benefit from early, explicit definitions of core invented terms such as 'mechanism landscape' and 'Hierarchical Emergence Framework (HEF)' before invoking them in the central claims.
  2. [Abstract] The empirical section reports precise numerical thresholds (e.g., 0.9745, R²=0.93) without accompanying code, data, or statistical details that would allow independent reproduction of the ANOVA p>0.13 result or the 92% peak statistic.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the careful reading and constructive critique. We address each major comment below. Where the manuscript requires clarification or expansion, we will revise accordingly; where the comments reflect a misunderstanding of the presented claims, we explain the intended scope.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The structural assumptions invoked to prove physical feasibility, strict metric contraction, and convergence to a unique fixed-point representation are never enumerated, defined, or justified, rendering it impossible to determine whether these results are non-trivial or follow from the framework.

    Authors: We agree that the abstract does not enumerate the assumptions. The full manuscript defines them in Section 3 (compactness of the mechanism space, Lipschitz continuity of the cost functional, and ergodicity of the stochastic dynamics). In revision we will add an explicit enumerated list of these assumptions immediately after the abstract and reference the relevant theorems. revision: yes

  2. Referee: [Abstract] Abstract: No derivations, lemmas, equations, or proof sketches are supplied for the claimed results on feasibility, contraction, or unique fixed-point convergence, despite the explicit assertion that such proofs exist under the structural assumptions.

    Authors: The proofs appear in Appendix B of the submitted manuscript (Theorems 1–3). To improve accessibility we will insert a one-paragraph proof sketch of the contraction mapping argument into the main text (new Section 3.2) while retaining the full derivations in the appendix. revision: yes

  3. Referee: [Abstract] Abstract / Empirical validation: The specific convergence accuracy 0.9745±0.014 and the tanh kink fit (R²=0.93) are obtained from the identical set of 111 experiments used to identify the Ec transition and the 92% weight-norm peak statistic, creating circularity that undermines the claim of independent validation.

    Authors: The manuscript does not claim independent validation from a held-out dataset. All reported statistics (weight-norm peaks, tanh collapse, and accuracy convergence) are descriptive of the same experimental corpus and constitute the empirical fingerprint predicted by HEF. We will revise the abstract and Section 5 to state explicitly that these quantities are jointly observed rather than independently validated. revision: partial

  4. Referee: [Abstract] Abstract: The 'unique minimum-cost mechanism' is defined relative to a cost function whose explicit functional form is not stated independently of the observed convergence behavior, leaving the uniqueness claim dependent on the same data used to report the 0.9745 accuracy.

    Authors: The cost function is defined in Equation (4) as C(m) = E_thermo(m) + λ · H_mechanism(m), where E_thermo is the thermodynamic energy and H_mechanism is the mechanism-competition entropy; uniqueness follows from the strict contraction proved in Theorem 2. We will restate this functional form in the abstract and add a sentence clarifying that the functional form is specified a priori, not fitted to the accuracy value. revision: yes

Circularity Check

2 steps flagged

Empirical convergence value and phase-transition fingerprint obtained from the same experiments used to identify Ec; uniqueness built into mechanism definition

specific steps
  1. fitted input called prediction [Abstract (testing paragraph)]
    "We identify a reproducible empirical fingerprint of the Ec transition: the weight norm peaks systematically before grokking in 92% of runs. Normalized accuracy curves collapse onto a tanh kink (R^2=0.93) consistent with a Landau-Ginzburg universality class, and all grokked models converge to 0.9745+/-0.014 regardless of initialization, weight decay, or training fraction (ANOVA p>0.13)."

    The reported convergence value and tanh fit parameters are numerically extracted from the same 111 experiments that were used to locate the Ec transition and the 92% weight-norm peak signature; the claimed independence from initialization is therefore a post-selection statistic of the fitted data rather than an independent prediction.

  2. self definitional [Abstract (framework introduction)]
    "The framework introduces a critical energy threshold Ec separating an exploration regime with competing mechanisms from a convergence regime governed by a unique minimum-cost mechanism. Under structural assumptions, we prove physical feasibility, derive strict metric contraction, and establish convergence toward a unique fixed-point representation independent of initial conditions."

    The convergence regime is defined as already governed by a unique minimum-cost mechanism; the subsequent claim to prove convergence to a unique fixed-point representation therefore restates a property built into the regime definition rather than deriving it from independent structural assumptions whose content is never supplied.

full rationale

The abstract presents the specific numerical convergence 0.9745+/-0.014 and tanh collapse as outcomes of the framework, yet these are measured from the identical 111 runs that define the Ec threshold and weight-norm fingerprint. The theoretical claim of convergence to a unique fixed point under structural assumptions is stated without enumerating those assumptions or exhibiting an independent derivation, while the convergence regime is introduced as already governed by a unique minimum-cost mechanism.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The framework rests on unspecified structural assumptions for its proofs and introduces new modeling constructs (mechanism landscape, Ec threshold) without independent falsifiable evidence supplied in the abstract; the empirical convergence accuracy is reported as a numerical outcome rather than a parameter-free prediction.

free parameters (1)
  • Ec
    Critical energy threshold separating exploration and convergence regimes; introduced without a derivation from first principles or external calibration in the abstract.
axioms (1)
  • ad hoc to paper Structural assumptions enabling proofs of physical feasibility, strict metric contraction, and convergence to a unique fixed point
    Invoked in the abstract to support the central theoretical claims but not enumerated or justified there.
invented entities (2)
  • Hierarchical Emergence Framework (HEF) no independent evidence
    purpose: Candidate universality framework for convergence phenomena across complex systems
    Newly proposed scaffold; no independent evidence outside this work is cited in the abstract.
  • mechanism landscape no independent evidence
    purpose: Constrained space of mechanisms governed by thermodynamic and information-theoretic laws
    Core modeling construct of HEF with no prior existence asserted in the abstract.

pith-pipeline@v0.9.1-grok · 5800 in / 1759 out tokens · 46108 ms · 2026-06-29T22:51:00.269647+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 10 canonical work pages · 5 internal anchors

  1. [1]

    P. W. Anderson. More is different.Science, 177(4047):393–396, 1972

  2. [2]

    S. Banach. Sur les op´ erations dans les ensembles abstraits.Fund. Math., 3:133–181, 1922

  3. [3]

    S. G. Bobkov and F. G¨ otze. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities.J. Funct. Anal., 163(1):1–28, 1999

  4. [4]

    M. A. Bedau. Weak emergence.Philosophical Perspectives, 11:375–399, 1997

  5. [5]

    Belkin, D

    M. Belkin, D. Hsu, S. Ma, and S. Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

  6. [6]

    C. H. Bennett. The thermodynamics of computation.Int. J. Theor. Phys., 21(12):905–940, 1982

  7. [7]

    Boix-Adsera, N

    E. Boix-Adsera, N. Mallinar, J. B. Simon, and M. Belkin. The features at convergence theorem for neural networks.International Conference on Learning Representations (ICLR), 2026. arXiv:2507.05644

  8. [8]

    Butterfield

    J. Butterfield. Emergence, reduction and supervenience.Found. Physics, 41(6):920–959, 2011

  9. [9]

    H. B. Callen.Thermodynamics and an Introduction to Thermostatistics, 2nd ed. Wiley, 1985

  10. [10]

    D. J. Chalmers. Strong and weak emergence. InThe Re-emergence of Emergence, OUP, 2006

  11. [11]

    Conway Morris.Life’s Solution

    S. Conway Morris.Life’s Solution. Cambridge University Press, 2003

  12. [12]

    Conway Morris.The Runes of Evolution

    S. Conway Morris.The Runes of Evolution. Templeton Press, 2015

  13. [13]

    T. M. Cover and J. A. Thomas.Elements of Information Theory, 2nd ed. Wiley, 2006

  14. [14]

    org/abs/2310.130618

    D. Doshi, A. Das, T. He, and A. Gromov. To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets.International Conference on Learning Representations (ICLR), 2024. arXiv:2310.13061

  15. [15]

    Deutsch and C

    D. Deutsch and C. Marletto. Constructor theory of information.Proc. R. Soc. A, 471:20140540, 2015

  16. [16]

    Elhage et al

    N. Elhage et al. Toy models of superposition.Transformer Circuits Thread, 2022

  17. [17]

    D. H. Erwin et al. The Cambrian conundrum.Science, 334(6059):1091–1097, 2011

  18. [18]

    J. W. Gibbs.Elementary Principles in Statistical Mechanics. Yale, 1902

  19. [19]

    P. R. Halmos.Measure Theory. Springer, 1950

  20. [20]

    Hausdorff.Grundz¨ uge der Mengenlehre

    F. Hausdorff.Grundz¨ uge der Mengenlehre. Veit, 1914

  21. [21]

    E. P. Hoel, L. Albantakis, and G. Tononi. Quantifying causal emergence.PNAS, 110(49):19790–19795, 2013

  22. [22]

    Hordijk and M

    W. Hordijk and M. Steel. Detecting autocatalytic sets.J. Theor. Biol., 227(4):451–461, 2004

  23. [23]

    M. Huh, B. Cheung, T. Wang, and P. Isola. The Platonic Representation Hypothesis.ICML, 2024. arXiv:2405.07987

  24. [24]

    Jarzynski

    C. Jarzynski. Nonequilibrium equality for free energy differences.Phys. Rev. Lett., 78(14):2690–2693, 1997

  25. [25]

    E. T. Jaynes. Information theory and statistical mechanics.Phys. Rev., 106:620–630, 1957

  26. [26]

    L. P. Kadanoff. Scaling laws for Ising models nearT c.Physics, 2(6):263–272, 1966. 41

  27. [27]

    S. A. Kauffman.The Origins of Order. OUP, 1993

  28. [28]

    Kreyszig.Introductory Functional Analysis with Applications

    E. Kreyszig.Introductory Functional Analysis with Applications. Wiley, 1978

  29. [29]

    L. D. Landau. On the theory of phase transitions.Zh. Eksp. Teor. Fiz., 7:19–32, 1937

  30. [30]

    Landauer

    R. Landauer. Irreversibility and heat generation.IBM J. Res. Dev., 5(3):183–191, 1961

  31. [31]

    Loshchilov and F

    I. Loshchilov and F. Hutter. Decoupled weight decay regularisation.ICLR, 2019

  32. [32]

    C. R. Marshall. Explaining the Cambrian explosion.Annu. Rev. Earth Planet. Sci., 34:355–384, 2006

  33. [33]

    Information-theoretic progress measures reveal grokking is an emergent phase transition.arXiv preprint arXiv:2408.08944, 2024

    K. Clauw, S. Stramaglia, and D. Marinazzo. Information-theoretic progress measures reveal grokking is an emergent phase transition. arXiv:2408.08944, 2024

  34. [34]

    Monod, J

    J. Monod, J. Wyman, and J.-P. Changeux. On the nature of allosteric transitions.J. Mol. Biol., 12(1):88–118, 1965

  35. [35]

    J. R. Munkres.Topology, 2nd ed. Prentice Hall, 2000

  36. [36]

    Nakkiran et al

    P. Nakkiran et al. Deep double descent.ICLR, 2020

  37. [37]

    Olah et al

    C. Olah et al. Zoom in: an introduction to circuits.Distill, 2020

  38. [38]

    K. T. David, J. G. Schraiber, J. G. Crandall, A. L. Labella, D. A. Opulente, M.-C. Harrison, J. F. Wolters, X. Zhou, X.-X. Shen, M. Groenewald, C. T. Hittinger, M. Pennell, and A. Rokas. Convergent expansions of keystone gene families drive metabolic innovation inSaccharomycotinayeasts.Proc. Natl. Acad. Sci. U.S.A., 122(23):e2500165122, 2025. doi:10.1073/...

  39. [39]

    Otto and C

    F. Otto and C. Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal., 173(2):361–400, 2000

  40. [40]

    Peer et al

    D. Peer et al. Nanocarriers as an emerging platform.Nature Nanotechnology, 2:751–760, 2007

  41. [41]

    Nanda, L

    N. Nanda, L. Chan, T. Lieberum, J. Smith, J. Steinhardt. Progress measures for grokking via mecha- nistic interpretability.International Conference on Learning Representations (ICLR), 2023

  42. [42]

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

    A. Power et al. Grokking: generalisation beyond overfitting. arXiv:2201.02177, 2022

  43. [43]

    Raginsky

    M. Raginsky. Strong data processing inequalities.IEEE Trans. Inf. Theory, 62(6):3355–3389, 2016

  44. [44]

    H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning.Phys. Rev. A, 45(8):6056– 6091, 1992

  45. [45]

    Szil´ ard.¨Uber die Entropieverminderung.Z

    L. Szil´ ard.¨Uber die Entropieverminderung.Z. Phys., 53:840–856, 1929

  46. [46]

    The information bottleneck method

    N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. arXiv:physics/0004057, 2000

  47. [47]

    Q. H. Truong and X. K. Truong. Prebiotic selection as a physical process.bioRxiv, 2026. doi:10.64898/2026.04.21.719958

  48. [48]

    X. K. Truong. First-passage prediction of grokking delay: a calibrated law under AdamW with causal validation. arXiv:2605.18845, 2026

  49. [49]

    K. G. Wilson. Renormalisation group and critical phenomena I.Phys. Rev. B, 4(9):3174–3183, 1971

  50. [50]

    K. G. Wilson. The renormalisation group and theεexpansion.Phys. Rep., 12(2):75–199, 1974

  51. [51]

    Y. Xu. The geometry of multi-task grokking: transverse instability, superposition, and weight decay phase structure. arXiv:2602.18523, 2026. 42