pith. machine review for the scientific record. sign in

arxiv: 2604.15669 · v1 · submitted 2026-04-17 · 🌊 nlin.AO

Recognition: unknown

Self-Organization to the Edge of Ergodicity Breaking in a Complex Adaptive System

Choy Heng Lai, Kan Chen, Ling Feng, Nixie Sapphira Lesmana

Pith reviewed 2026-05-10 07:56 UTC · model grok-4.3

classification 🌊 nlin.AO
keywords self-organized criticalityergodicity breakingcomplex adaptive systemsevolutionary avalanchesreinforcement learningSherrington-Kirkpatrick landscapescale-free dynamics
0
0 comments X

The pith

Coupled memory-dependent learning and extremal replacement drives an adaptive system to the ergodicity-breaking boundary.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EvoSK, a minimal model in which agents perform memory-dependent reinforcement learning on a rugged Sherrington-Kirkpatrick landscape while the population evolves by replacing the least-fit agents. This coupled process causes the system to self-organize to a critical state exactly at the transition between ergodic and non-ergodic phases. At that boundary the system produces scale-free evolutionary avalanches with exponent near -1.5 and achieves higher collective rewards than any fixed, manually tuned non-evolutionary version. A reader would care because the result suggests that optimal performance in complex adaptive systems can emerge automatically at a well-defined physical transition without parameter tuning.

Core claim

The coupled dynamics of memory-dependent reinforcement learning and extremal replacement in the EvoSK model drives the system to a critical state residing on the transition boundary between ergodic and non-ergodic phases. At this boundary the system exhibits scale-free evolutionary avalanches with a mean-field exponent τ ≈ -1.5 while simultaneously achieving collective rewards that surpass those of any manually finetuned, non-evolutionary regime. The results establish a mechanistic link between the statistical physics of ergodicity breaking and the functional optimality of complex adaptive systems on rugged, high-dimensional landscapes.

What carries the argument

The EvoSK model that couples memory-dependent reinforcement learning on the Sherrington-Kirkpatrick landscape with extremal replacement of the least-fit agents.

If this is right

  • Scale-free evolutionary avalanches appear with mean-field exponent τ ≈ -1.5.
  • Collective rewards exceed those obtained in any manually finetuned non-evolutionary regime.
  • The ergodicity-breaking boundary functions as a robust attractor for adaptation on rugged high-dimensional landscapes.
  • A direct mechanistic connection exists between ergodicity-breaking physics and optimality in complex adaptive systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same self-organization mechanism may operate in biological evolution or market systems where agents adapt and are replaced according to performance.
  • Engineers of artificial agents could add evolutionary replacement to reach high-performance critical states without exhaustive manual tuning.
  • The attraction to the boundary may persist on other rugged landscapes such as NK models, offering a testable extension.
  • Many real-world complex systems could be operating near this boundary to balance exploration and stability.

Load-bearing premise

The specific combination of memory-dependent reinforcement learning and extremal replacement is sufficient to attract the system to the ergodicity-breaking boundary without extra tuning, and that the avalanche statistics and reward gains are produced by proximity to that boundary.

What would settle it

A simulation in which memory dependence is removed or the replacement rule is altered, after which scale-free avalanches disappear and collective rewards fall to or below the level of manually tuned non-evolutionary regimes.

Figures

Figures reproduced from arXiv: 2604.15669 by Choy Heng Lai, Kan Chen, Ling Feng, Nixie Sapphira Lesmana.

Figure 1
Figure 1. Figure 1: FIG. 1 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
read the original abstract

Self-organized criticality (SOC) is widely proposed as a fundamental mechanism for collective behavior, yet its role in objective-driven, heterogeneous adaptive systems underpinning real complex systems remains less understood. We introduce EvoSK, a minimal evolutionary model in which agents perform memory dependent reinforcement learning on a rugged Sherrington-Kirkpatrick landscape while the population evolves through extremal replacement of the least fit agents. We demonstrate that this coupled dynamics drives the system to a critical state residing on the transition boundary between ergodic and non-ergodic phases. At this boundary, the system exhibits scale-free evolutionary avalanches with a mean-field exponent $\tau \approx -1.5$, while simultaneously achieving collective rewards that surpass those of any manually finetuned, non-evolutionary regime. Our results provide a mechanistic link between the statistical physics of ergodicity breaking and the functional optimality of complex adaptive systems, suggesting that the edge of ergodicity breaking acts as a robust attractor for systems adapting on rugged, high-dimensional landscapes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces EvoSK, a minimal model in which a population of agents performs memory-dependent reinforcement learning on a rugged Sherrington-Kirkpatrick (SK) landscape while the population evolves via extremal replacement of the least-fit agents. It claims that the coupled RL-plus-evolution dynamics self-organizes the system to the boundary between ergodic and non-ergodic phases of the underlying SK landscape, producing scale-free evolutionary avalanches with mean-field exponent τ ≈ −1.5 and collective rewards that exceed those of any manually finetuned non-evolutionary regime.

Significance. If the central claims are substantiated, the work would establish a concrete mechanistic link between the statistical physics of ergodicity breaking in high-dimensional disordered systems and the emergence of optimal collective performance in objective-driven adaptive systems. The model is minimal, combines reinforcement learning with extremal selection in a novel way, and offers a falsifiable prediction that the edge of ergodicity breaking acts as a robust attractor on rugged landscapes.

major comments (3)
  1. [Results] The central claim that the dynamics drives the system precisely to the ergodic/non-ergodic transition boundary is not supported by any quantitative order parameter (e.g., Edwards-Anderson overlap, time-vs-ensemble variance of agent rewards, or replica-symmetry-breaking indicator) whose divergence or jump is shown to coincide with the onset of the reported avalanches.
  2. [Methods / Simulation details] No simulation protocol is described: the abstract and model sections supply neither the number of independent runs, the statistical tests used to locate the ergodic/non-ergodic boundary, nor the procedure for comparing evolutionary versus manually finetuned regimes.
  3. [Results] It is not demonstrated that the observed avalanche exponent τ ≈ −1.5 and the reward superiority are caused by proximity to the SK ergodicity-breaking transition rather than by the extremal replacement rule alone; an ablation or control simulation isolating the selection mechanism is absent.
minor comments (1)
  1. [Abstract / Results] The sign convention for the avalanche exponent τ ≈ −1.5 should be clarified (standard SOC notation is P(s) ∼ s^−τ with τ > 1); the current phrasing risks confusion with the mean-field SOC value.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive report. Their comments identify key areas where the manuscript can be strengthened with additional quantitative analysis and methodological details. We respond to each major comment below and will incorporate revisions to address them.

read point-by-point responses
  1. Referee: The central claim that the dynamics drives the system precisely to the ergodic/non-ergodic transition boundary is not supported by any quantitative order parameter (e.g., Edwards-Anderson overlap, time-vs-ensemble variance of agent rewards, or replica-symmetry-breaking indicator) whose divergence or jump is shown to coincide with the onset of the reported avalanches.

    Authors: We agree that a quantitative demonstration using an order parameter would provide stronger evidence for the self-organization to the transition boundary. In the revised manuscript, we will add figures showing the Edwards-Anderson overlap and the variance of rewards across time and ensemble as the system parameters are varied. These will illustrate that the transition signatures align with the regime where scale-free avalanches emerge. This addition will directly support the central claim. revision: yes

  2. Referee: No simulation protocol is described: the abstract and model sections supply neither the number of independent runs, the statistical tests used to locate the ergodic/non-ergodic boundary, nor the procedure for comparing evolutionary versus manually finetuned regimes.

    Authors: We appreciate this observation and will rectify the lack of simulation details. The revised Methods section will specify the number of independent simulation runs (we typically use 50-200 runs for averaging), the criteria and statistical methods for locating the ergodic/non-ergodic boundary (such as monitoring the divergence of overlap fluctuations), and the full protocol for benchmarking against non-evolutionary regimes, including how parameters are optimized in the latter case for fair comparison. revision: yes

  3. Referee: It is not demonstrated that the observed avalanche exponent τ ≈ −1.5 and the reward superiority are caused by proximity to the SK ergodicity-breaking transition rather than by the extremal replacement rule alone; an ablation or control simulation isolating the selection mechanism is absent.

    Authors: This comment raises an important issue regarding causality. While the manuscript compares evolutionary dynamics to non-evolutionary ones, we acknowledge that an explicit ablation isolating the role of the transition is needed. In the revision, we will include additional simulations: (1) disabling extremal selection (e.g., random replacement) to show loss of criticality, and (2) fixing the system parameters away from the transition point while retaining the selection rule to demonstrate that avalanches and reward superiority diminish. These controls will confirm that the observed phenomena arise from the interplay at the boundary. revision: yes

Circularity Check

0 steps flagged

No circularity: emergent simulation results are independent of inputs

full rationale

The paper defines a concrete agent-based model (memory-dependent RL on SK landscape plus extremal replacement) and reports statistics obtained by direct simulation. The claim that the dynamics reaches the ergodicity-breaking boundary is presented as an observed outcome of those rules, not as a quantity fitted to or redefined by the avalanche exponent or reward values. No equations, self-citations, or ansatzes in the abstract reduce the target result to the model inputs by construction. The skeptic concern about missing order-parameter diagnostics is an evidentiary gap, not a circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The model is presented as minimal, so its free parameters are the standard tunable quantities in RL and evolutionary algorithms. The central assumptions concern the representativeness of the SK landscape and the sufficiency of the chosen dynamics; no new physical entities are introduced.

free parameters (2)
  • RL memory length or decay rate
    Memory-dependent reinforcement learning requires at least one parameter controlling how far back past actions are remembered.
  • fraction or number of agents replaced per step
    Extremal replacement is a core rule whose rate must be specified.
axioms (2)
  • domain assumption The Sherrington-Kirkpatrick landscape is a suitable proxy for rugged high-dimensional fitness landscapes encountered by real adaptive systems.
    Invoked directly as the environment on which agents learn.
  • domain assumption Extremal replacement of the least fit agents, when coupled with individual memory-dependent RL, produces collective self-organization to the ergodicity boundary.
    This is the load-bearing mechanism asserted in the abstract.

pith-pipeline@v0.9.0 · 5481 in / 1579 out tokens · 43827 ms · 2026-05-10T07:56:44.255681+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 1 canonical work pages

  1. [1]

    At very long memory (α= 0.9), the system settles close to the ergodic phase, withϵvalues slightly larger than 0

    For intermediate memory (α= 0.5–0.8), the evolved temperatures consistently lie within the narrow region whereϵvaries most rapidly with perturbationλ, a clear indicator of the edge of ergodicity breaking. At very long memory (α= 0.9), the system settles close to the ergodic phase, withϵvalues slightly larger than 0. These results demonstrate that evolutio...

  2. [2]

    Across all realizations, these distri- butions stabilize after a transient time oft stat = 15,000 iterations

    Phase Diagnostics via Temperature Rescaling For each memory parameterα, the evolutionary dynamics generate a heterogeneous agent-level temperature setT(t;α) ={T i(t;α)} N i=1. Across all realizations, these distri- butions stabilize after a transient time oft stat = 15,000 iterations. We therefore define the emergent temperature ensemble at memoryαas T α ...

  3. [3]

    S2.Critical Avalanches.Avalanche size distributionsP(S) on log-log axes forα= 0.5−0.8, coinciding with the the clear critical regime (see Fig 2)

    Avalanche Definition and Scaling Estimation FIG. S2.Critical Avalanches.Avalanche size distributionsP(S) on log-log axes forα= 0.5−0.8, coinciding with the the clear critical regime (see Fig 2). Exponentτis the flattest with the smallest range (IQR) of slope value atα= 0.75; smallerα= 0.5 attains a steeperτ≈ −1.85; largerα= 0.8 attains a sufficiently flat...

  4. [4]

    P. Bak, C. Tang, and K. Wiesenfeld, Self-organized criticality: An explanation of the 1/f noise, Physical review letters59, 381 (1987). 15

  5. [5]

    Drossel and F

    B. Drossel and F. Schwabl, Self-organized critical forest-fire model, Physical review letters69, 1629 (1992)

  6. [6]

    S. Clar, B. Drossel, K. Schenk, and F. Schwabl, Self-organized criticality in forest-fire models, Physica A: Statistical Mechanics and its Applications266, 153 (1999)

  7. [7]

    Zapperi, K

    S. Zapperi, K. B. Lauritsen, and H. E. Stanley, Self-organized branching processes: mean-field theory for avalanches, Physical review letters75, 4071 (1995)

  8. [8]

    Pruessner,Self-organised criticality: theory, models and characterisation(Cambridge Uni- versity Press, 2012)

    G. Pruessner,Self-organised criticality: theory, models and characterisation(Cambridge Uni- versity Press, 2012)

  9. [9]

    Cavagna, A

    A. Cavagna, A. Cimarelli, I. Giardina, G. Parisi, R. Santagati, F. Stefanini, and M. Viale, Scale-free correlations in starling flocks, Proceedings of the National Academy of Sciences107, 11865 (2010)

  10. [10]

    Bak and K

    P. Bak and K. Sneppen, Punctuated equilibrium and criticality in a simple model of evolution, Physical review letters71, 4083 (1993)

  11. [11]

    J. M. Beggs and D. Plenz, Neuronal avalanches in neocortical circuits, Journal of neuroscience 23, 11167 (2003)

  12. [12]

    J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities., Proceedings of the national academy of sciences79, 2554 (1982)

  13. [13]

    Feng and C

    L. Feng and C. H. Lai, Optimal machine intelligence near the edge of chaos, arXiv preprint arXiv:1909.05176 (2019)

  14. [14]

    Z. Qin, F. Khawar, and T. Wan, Collective game behavior learning with probabilistic graphical models, Neurocomputing194, 74 (2016)

  15. [15]

    Heins, B

    C. Heins, B. Millidge, L. Da Costa, R. P. Mann, K. J. Friston, and I. D. Couzin, Collective behavior from surprise minimization, Proceedings of the National Academy of Sciences121, e2320239121 (2024)

  16. [16]

    Manna, Critical exponents of the sand pile models in two dimensions, Physica A: Statistical Mechanics and its Applications179, 249 (1991)

    S. Manna, Critical exponents of the sand pile models in two dimensions, Physica A: Statistical Mechanics and its Applications179, 249 (1991)

  17. [17]

    Christensen and Z

    K. Christensen and Z. Olami, Sandpile models with and without an underlying spatial struc- ture, Physical Review E48, 3361 (1993)

  18. [18]

    Flyvbjerg, K

    H. Flyvbjerg, K. Sneppen, and P. Bak, Mean field theory for a simple model of evolution, Physical review letters71, 4087 (1993)

  19. [19]

    Nishimori,Statistical physics of spin glasses and information processing: an introduction, 16 111 (Clarendon Press, 2001)

    H. Nishimori,Statistical physics of spin glasses and information processing: an introduction, 16 111 (Clarendon Press, 2001)

  20. [20]

    satisficing

    J. Garnier-Brun, M. Benzaquen, and J.-P. Bouchaud, Unlearnable games and “satisficing” decisions: a simple model for a complex world, Physical Review X14, 021039 (2024)

  21. [21]

    C. Kidd, S. T. Piantadosi, and R. N. Aslin, The goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex, PloS one7, e36399 (2012)

  22. [22]

    R. C. Wilson, A. Shenhav, M. Straccia, and J. D. Cohen, The eighty five percent rule for optimal learning, Nature communications10, 4646 (2019)

  23. [23]

    Seneta, Coefficients of ergodicity: structure and applications, Advances in applied proba- bility11, 576 (1979)

    E. Seneta, Coefficients of ergodicity: structure and applications, Advances in applied proba- bility11, 576 (1979)

  24. [24]

    Wolfer, Empirical and instance-dependent estimation of markov chain and mixing time, Scandinavian Journal of Statistics51, 557 (2024)

    G. Wolfer, Empirical and instance-dependent estimation of markov chain and mixing time, Scandinavian Journal of Statistics51, 557 (2024)

  25. [25]

    Https://github.com/nixieslesmana/EvoSKGame. 17