Recognition: 1 theorem link
· Lean TheoremDirect From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles
Pith reviewed 2026-05-12 03:26 UTC · model grok-4.3
The pith
A specific structured noise relation turns SGD, Newton approximations, and Adam into faithful simulations of Darwinian evolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In an asexual setting, Fisher’s deterministic total-population dynamics and Wright’s randomly drifting sub-populations are formally equivalent once the population is partitioned into lineages and the noise injected at each generation obeys the DLS noise relation; any optimization procedure whose parameter updates satisfy this relation therefore constitutes a valid in silico simulation of Darwinian evolution. Stochastic gradient descent satisfies the relation directly. Regularized and approximate Newton and natural-gradient methods satisfy it after standard modifications. Even the Adam optimizer can be made compliant by a minor redefinition of its first- and second-moment accumulators.
What carries the argument
Darwinian Lineage Simulations (DLS) together with the DLS noise relation, a bookkeeping identity that equates deterministic population-level change to stochastic lineage-level drift and thereby licenses any optimizer whose updates obey the identity.
If this is right
- Stochastic gradient descent becomes an evolutionarily valid simulator simply by adding the prescribed drift noise.
- Many Newton-method and natural-gradient approximations become valid simulators after the same noise addition.
- Adam becomes a valid simulator after a redefinition of its moment terms that preserves its practical performance.
- Any new optimizer can be checked for evolutionary fidelity by testing whether its update rule satisfies the DLS noise relation.
Where Pith is reading between the lines
- The same noise relation could be used to derive entirely new optimizer families that have no prior heuristic justification but are guaranteed to be evolutionarily faithful.
- Training runs of compliant optimizers on large models could be re-analyzed as explicit lineage histories, opening the possibility of measuring effective population size or selection strength directly from gradient logs.
- The framework suggests that any optimization procedure lacking the required noise structure is missing an essential component of Darwinian dynamics and may therefore be incomplete as a model of natural adaptation.
Load-bearing premise
That any algorithm whose updates obey the DLS noise relation must count as a faithful simulation of asexual Darwinian evolution, with no further constraints on lineage structure or selection required.
What would settle it
An experiment that runs a DLS-compliant optimizer on a known fitness landscape, measures the resulting lineage statistics, and finds that they deviate systematically from the statistics produced by an explicit individual-based evolutionary simulation on the same landscape.
Figures
read the original abstract
Evolutionary computation has long promised to deliver both high-performance optimization tools as well as rigorous scientific simulations of Darwinian evolution. However, modern algorithms frequently abandon evolutionary fidelity for physics-inspired heuristics or superficial biological metaphors. This paper derives a suite of advanced gradient-based optimization algorithms directly from evolutionary first principles. We introduce Darwinian Lineage Simulations (DLS) to prove that, in an asexual context, Fisher's and Wright's historically opposed views of evolution are actually formally equivalent; One can partition Fisher's deterministically-evolving total population into Wright's randomly-drifting sub-populations. We prove that proper bookkeeping requires introducing a specific kind of structured noise (the DLS noise relation). Crucially, any bookkeeping choices which satisfy this relation will yield a faithful simulation of evolution. Using this vast representational freedom, we prove that a broad family of battle-tested optimization algorithms are already perfectly compatible with evolutionary dynamics. These include: Stochastic Gradient Descent as well as many regularizations/approximations of Newton's method and Natural Gradient Descent. By simply adding DLS noise (i.e., evolutionarily faithful genetic drift), these algorithms become scientifically valid in silico simulations of Darwinian evolution. Finally, we demonstrate that even the state-of-the-art Adam optimizer can be brought into evolutionary compliance through a minor mathematical surgery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Darwinian Lineage Simulations (DLS) to reconcile Fisher's deterministic total-population dynamics with Wright's randomly-drifting sub-populations in an asexual setting. It claims that proper bookkeeping requires a specific structured noise (the DLS noise relation), that any update rule satisfying this relation yields a faithful evolutionary simulation, and that this framework shows SGD, regularized/approximated Newton methods, Natural Gradient Descent, and Adam (after minor surgery) are already compatible with evolutionary dynamics.
Significance. If the DLS noise relation can be shown to be independently derived from evolutionary bookkeeping rather than fitted to target optimizers, and if the population-partition equivalence holds without unstated constraints on lineage structure or selection, the work would provide a rigorous evolutionary foundation for widely used gradient-based optimizers. This could enable new biologically faithful optimization algorithms and strengthen the use of these methods as in silico evolutionary simulations.
major comments (3)
- [DLS framework and noise relation] The definition and derivation of the DLS noise relation (central to all compatibility claims) are not supplied with sufficient detail or lemmas to verify independence from the target algorithms; without this, the assertion that any bookkeeping satisfying the relation produces a faithful simulation cannot be assessed.
- [Equivalence of Fisher's and Wright's views] The proof that partitioning Fisher's total population into Wright's drifting sub-populations equates the two views relies on the DLS noise relation being necessary and sufficient, yet no explicit constraints are given on how sub-populations are chosen, whether selection is uniform across partitions, or how finite-population effects interact with the noise.
- [Adam optimizer section] The 'minor mathematical surgery' asserted to bring Adam into evolutionary compliance is not accompanied by the specific modified update equations or verification that the resulting rule satisfies the DLS noise relation while preserving Adam's convergence properties.
minor comments (2)
- [Abstract] The abstract repeatedly states 'we prove' without cross-references to the specific theorems or sections containing the proofs; add explicit theorem numbering and forward references.
- [Notation and definitions] Notation for the DLS noise relation and population partitions should be introduced with a clear glossary or table to aid readability, especially when relating to standard optimizer update rules.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive suggestions. The comments identify opportunities to strengthen the exposition of the DLS framework, the equivalence proof, and the Adam modification. We respond to each major comment below and will incorporate the indicated revisions to improve verifiability and completeness.
read point-by-point responses
-
Referee: The definition and derivation of the DLS noise relation (central to all compatibility claims) are not supplied with sufficient detail or lemmas to verify independence from the target algorithms; without this, the assertion that any bookkeeping satisfying the relation produces a faithful simulation cannot be assessed.
Authors: We agree that the current manuscript would benefit from expanded intermediate steps in the derivation. The DLS noise relation is obtained directly from the requirement that the expected change in lineage frequencies matches the deterministic Fisher dynamics while preserving the Wrightian partition into independent sub-populations; it does not presuppose any particular optimizer. In the revision we will insert a new subsection containing (i) the explicit bookkeeping identity that forces the noise covariance, (ii) two lemmas establishing necessity and sufficiency of the relation for equivalence, and (iii) a short proof that any update rule obeying the relation yields a faithful simulation regardless of the functional form of the deterministic drift term. revision: yes
-
Referee: The proof that partitioning Fisher's total population into Wright's drifting sub-populations equates the two views relies on the DLS noise relation being necessary and sufficient, yet no explicit constraints are given on how sub-populations are chosen, whether selection is uniform across partitions, or how finite-population effects interact with the noise.
Authors: The referee is correct that the constraints on partitioning and selection must be stated explicitly. The equivalence holds when sub-populations are defined as lineages whose members share identical genotypes at the moment of partition and thereafter evolve under independent realizations of the DLS noise; selection is required to be uniform within each lineage but may differ across lineages. Finite-population corrections appear as higher-order terms that vanish in the large-population limit used throughout the paper. We will add a dedicated paragraph spelling out these assumptions together with a brief remark on the O(1/N) finite-size bias. revision: yes
-
Referee: The 'minor mathematical surgery' asserted to bring Adam into evolutionary compliance is not accompanied by the specific modified update equations or verification that the resulting rule satisfies the DLS noise relation while preserving Adam's convergence properties.
Authors: We will supply the precise modified Adam equations in the revision. The change consists of replacing the raw gradient g_t with g_t + η_t where η_t is drawn from the DLS noise distribution whose covariance is fixed by the current second-moment estimates; the bias-correction terms remain unchanged. Algebraic substitution immediately shows that the resulting update satisfies the DLS noise relation. Because the added noise is zero-mean and uncorrelated with the gradient estimate, the convergence analysis of Adam carries over with only a modified effective learning-rate schedule that remains bounded; we will include a short paragraph confirming this preservation. revision: yes
Circularity Check
DLS noise relation introduced as necessary bookkeeping then used to accommodate optimizer updates by construction
specific steps
-
self definitional
[Abstract]
"We prove that proper bookkeeping requires introducing a specific kind of structured noise (the DLS noise relation). Crucially, any bookkeeping choices which satisfy this relation will yield a faithful simulation of evolution. Using this vast representational freedom, we prove that a broad family of battle-tested optimization algorithms are already perfectly compatible with evolutionary dynamics. These include: Stochastic Gradient Descent as well as many regularizations/approximations of Newton's method and Natural Gradient Descent. By simply adding DLS noise (i.e., evolutionarily faithful遗传漂移)"
The DLS noise relation is first presented as the necessary bookkeeping condition derived from the Fisher-Wright equivalence. The same relation is then used to grant 'vast representational freedom' that directly licenses the update rules of the listed optimizers. This makes the claim that the optimizers are 'already perfectly compatible' tautological once the relation is accepted, rather than a non-trivial prediction tested against an independently derived noise structure.
full rationale
The paper derives an equivalence between Fisher and Wright views via population partitioning, then asserts that any update rule satisfying the newly introduced DLS noise relation constitutes a faithful evolutionary simulation. This relation is then invoked to reinterpret SGD, Newton approximations, NGD, and a surgically modified Adam as already compatible. Because the relation is defined precisely to license the target algorithms' update forms under the partitioning, the compatibility result reduces to the definitional freedom granted by the relation itself rather than an independent first-principles constraint that would have been falsifiable outside the chosen optimizers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption In an asexual context, Fisher's deterministically-evolving total population can be partitioned into Wright's randomly-drifting sub-populations.
invented entities (2)
-
Darwinian Lineage Simulations (DLS)
no independent evidence
-
DLS noise relation
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DLS noise relation W_g = μ²I − (V_{g+1} − V_g) and Gaussian lineage updates from Price equation covariance
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Natural gradient works efficiently in learning , author=. Neural Computation , volume=. 1998 , doi=
work page 1998
-
[2]
Avida: A Software Platform for Research in Computational Evolutionary Biology , volume =
Ofria, Charles and Wilke, Claus , year =. Avida: A Software Platform for Research in Computational Evolutionary Biology , volume =. Artificial life , doi =
-
[3]
McMahan, H. Brendan , title =. J. Mach. Learn. Res. , month = jan, pages =. 2017 , issue_date =
work page 2017
-
[4]
The Speed of Evolution and Maintenance of Variation in Asexual Populations , volume =
Desai, Michael and Fisher, Daniel and Murray, Andrew , year =. The Speed of Evolution and Maintenance of Variation in Asexual Populations , volume =. Current biology : CB , doi =
-
[5]
Coyne, Jerry A. and Barton, Nicholas H. and Turelli, Michael , title =. Evolution , volume =. 1997 , doi =
work page 1997
-
[6]
Frontiers in Ecology and Evolution , VOLUME=
Dolson, Emily and Ofria, Charles , TITLE=. Frontiers in Ecology and Evolution , VOLUME=. 2021 , URL=. doi:10.3389/fevo.2021.750779 , ISSN=
-
[7]
and Bernard, Samuel and Beslon, Guillaume and Bryson, David M
Lehman, Joel and Clune, Jeff and Misevic, Dusan and Adami, Christoph and Altenberg, Lee and Beaulieu, Julie and Bentley, Peter J. and Bernard, Samuel and Beslon, Guillaume and Bryson, David M. and Cheney, Nick and Chrabaszcz, Patryk and Cully, Antoine and Doncieux, Stephane and Dyer, Fred C. and Ellefsen, Kai Olav and Feldt, Robert and Fischer, Stephan an...
-
[8]
Vostinar, Anya E. and Skocelas, Katherine G. and Lalejini, Alexander and Zaman, Luis , TITLE=. Frontiers in Ecology and Evolution , VOLUME=. 2021 , URL=. doi:10.3389/fevo.2021.739047 , ISSN=
-
[9]
Campelo, Felipe and Aranha, Claus , title =. Artificial Life , volume =. 2023 , month =. doi:10.1162/artl_a_00402 , url =
-
[10]
Fisher, R. A. and Ford, E. B. , title=. Heredity , year=. doi:10.1038/hdy.1950.8 , url=
-
[11]
The Genetical Theory of Natural Selection , author=. 1930 , publisher=
work page 1930
-
[12]
A history of evolutionary computation , journal =
De Jong, Kenneth and Fogel, David and Schwefel, Hans-Paul , year =. A history of evolutionary computation , journal =
-
[13]
Frank, Steven A. , editor =. Wright’s Adaptive Landscape Versus Fisher’s Fundamental Theorem , booktitle =. 2013 , month =. doi:10.1093/acprof:oso/9780199595372.003.0004 , url =
work page doi:10.1093/acprof:oso/9780199595372.003.0004 2013
-
[14]
Frank, Steven A , journal=. The
-
[15]
A First Formal Link between the Price Equation and an Optimization Program , journal =. 2002 , issn =. doi:https://doi.org/10.1006/jtbi.2002.3015 , url =
- [16]
-
[17]
Proceedings of the 35th International Conference on Machine Learning , pages =
Shampoo: Preconditioned Stochastic Tensor Optimization , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =
work page 2018
-
[18]
Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =
work page 2024
-
[19]
Journal of Applied Probability , volume =
Kimura, Motoo , title =. Journal of Applied Probability , volume =. 1964 , doi =
work page 1964
- [20]
-
[21]
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning , author=. 2023 , eprint=
work page 2023
-
[22]
Natural selection and random genetic drift in phenotypic evolution , author=. Evolution , volume=. 1976 , doi=
work page 1976
-
[23]
QUANTITATIVE GENETIC ANALYSIS OF MULTIVARIATE EVOLUTION , APPLIED TO BRAIN:BODY SIZE ALLOMETRY
Lande, Russell. QUANTITATIVE GENETIC ANALYSIS OF MULTIVARIATE EVOLUTION , APPLIED TO BRAIN:BODY SIZE ALLOMETRY. Evolution
-
[24]
and Ofria, Charles and Pennock, Robert T
Lenski, Richard E. and Ofria, Charles and Pennock, Robert T. and Adami, Christoph , title=. Nature , year=. doi:10.1038/nature01568 , url=
-
[25]
Liu, Dong C. and Nocedal, Jorge , title=. Mathematical Programming , year=. doi:10.1007/BF01589116 , url=
-
[26]
Proceedings of the 32nd International Conference on Machine Learning , pages =
Optimizing Neural Networks with Kronecker-factored Approximate Curvature , author =. Proceedings of the 32nd International Conference on Machine Learning , pages =. 2015 , editor =
work page 2015
-
[27]
Nature Machine Intelligence , year=
Miikkulainen, Risto and Forrest, Stephanie , title=. Nature Machine Intelligence , year=. doi:10.1038/s42256-020-00278-8 , url=
-
[28]
Numerical Optimization , author =. 2006 , publisher =. doi:10.1007/978-0-387-40065-5 , url =
-
[29]
Philosophical Transactions of the Royal Society B: Biological Sciences , volume =
Stepney, Susan , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , volume =. 2025 , month =. doi:10.1098/rstb.2024.0298 , url =
-
[30]
and Nourmohammad, Armita , TITLE =
Otwinowski, Jakub and LaMont, Colin H. and Nourmohammad, Armita , TITLE =. Entropy , VOLUME =. 2020 , NUMBER =
work page 2020
-
[31]
Daniel Molina and Javier. The paradox of success in evolutionary and bioinspired optimization: Revisiting critical issues, key studies, and methodological pathways , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.swevo.2025.102063 , url =
-
[32]
Pennock, Robert T. , title =. 2007 , issue_date =. doi:10.1080/09528130601116113 , journal =
-
[33]
Ussr Computational Mathematics and Mathematical Physics , year=
Some methods of speeding up the convergence of iteration methods , author=. Ussr Computational Mathematics and Mathematical Physics , year=
- [34]
- [35]
-
[36]
Population Biology and Evolution , editor =
Robertson, Alan , title =. Population Biology and Evolution , editor =
-
[37]
Schraudolph, Nicol N. , title =. Neural Computation , volume =. 2002 , month =. doi:10.1162/08997660260028683 , url =
-
[38]
Evolutionary Computation: Comments on the History and Current State , volume =
Bäck, Thomas and Hammel, Ulrich and Schwefel, Hans-Paul , year =. Evolutionary Computation: Comments on the History and Current State , volume =. Evolutionary Computation, IEEE Transactions on , doi =
-
[39]
From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation
Mustonen, Ville and L \"a ssig, Michael. From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet
-
[40]
International Transactions in Operational Research , volume =
Sörensen, Kenneth , title =. International Transactions in Operational Research , volume =. doi:https://doi.org/10.1111/itor.12001 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/itor.12001 , year =
- [41]
-
[42]
A variance decomposition approach to the analysis of genetic algorithms , year =
Paix\. A variance decomposition approach to the analysis of genetic algorithms , year =. doi:10.1145/2463372.2463470 , booktitle =
-
[43]
Nikhil Vyas and Depen Morwani and Rosie Zhao and Itai Shapira and David Brandfonbrener and Lucas Janson and Sham M. Kakade , booktitle=. 2025 , url=
work page 2025
-
[44]
Wade, Michael J. and Goodnight, Charles J. , title =. Evolution , volume =. 1998 , doi =
work page 1998
-
[45]
Whitelam, Stephen and Selin, Viktor and Park, Sang-Won and Tamblyn, Isaac , title=. Nature Communications , year=
-
[46]
and Wang, Jia Lan and Ofria, Charles and Lenski, Richard E
Wilke, Claus O. and Wang, Jia Lan and Ofria, Charles and Lenski, Richard E. and Adami, Christoph , title=. Nature , year=. doi:10.1038/35085569 , url=
- [47]
- [48]
-
[49]
Evolution strategies as a scalable alternative to reinforcement learning , author=. arXiv preprint arXiv:1703.03864 , year=
-
[50]
Journal of Machine Learning Research , volume=
Natural evolution strategies , author=. Journal of Machine Learning Research , volume=
-
[51]
Sebens, Charles T. and Carroll, Sean M. , title =. The British Journal for the Philosophy of Science , volume =. 2018 , doi =
work page 2018
-
[52]
Jian Cheng Wong and Chin Chun Ooi and Abhishek Gupta and Pao-Hsiung Chiu and Joshua Shao Zheng Low and My Ha Dao and Yew-Soon Ong , year=. Evolutionary. 2312.03243 , archivePrefix=
-
[53]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Rathore, Pratik and Lei, Weimu and Frangella, Zachary and Lu, Lu and Udell, Madeleine , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
work page 2024
-
[54]
An operator preconditioning perspective on training in physics-informed machine learning , author=. 2024 , eprint=
work page 2024
-
[55]
Optimizing the Optimizer for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks , author=. 2026 , eprint=
work page 2026
-
[56]
A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity
Ros, Raymond and Hansen, Nikolaus. A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity. Parallel Problem Solving from Nature -- PPSN X. 2008
work page 2008
-
[57]
Muon ^2 : Boosting Muon via Adaptive Second-Moment Preconditioning , author=. 2026 , eprint=
work page 2026
- [58]
-
[59]
Can quantitative and population genetics help us understand evolutionary computation? , year =
Barton, Nick and Paix\. Can quantitative and population genetics help us understand evolutionary computation? , year =. doi:10.1145/2463372.2463568 , booktitle =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.