arxiv: 2604.10571 · v1 · submitted 2026-04-12 · 🧬 q-bio.PE · cs.AI· cs.CY· cs.NE

Recognition: unknown

Universal statistical signatures of evolution in artificial intelligence architectures

Theodor Spiro

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:00 UTC · model grok-4.3

classification 🧬 q-bio.PE cs.AIcs.CYcs.NE

keywords distribution of fitness effectsAI architectureablation studiesevolutionary statisticsfitness landscapepunctuated equilibriaconvergent evolutionsubstrate independence

0 comments

The pith

AI architectural modifications follow the same heavy-tailed distribution of fitness effects as biological mutations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles 935 ablation experiments across 161 AI publications and compares their performance impacts to mutation effects measured in biology. It finds that the distribution of fitness effects for these changes is heavy-tailed and matches the shape seen in fruit flies and yeast, even though AI has a higher share of beneficial outcomes. This pattern, plus logistic growth in new architectures and repeated independent inventions of the same traits, is presented as evidence that evolutionary statistics arise from the topology of the fitness landscape rather than from any particular mechanism of change or selection. A sympathetic reader would conclude that the same statistical rules govern both natural and artificial systems once the landscape is fixed.

Core claim

Compiling ablation results shows the distribution of fitness effects of architectural modifications follows a Student's t-distribution with 68 percent deleterious, 19 percent neutral and 13 percent beneficial outcomes. This shape is statistically indistinguishable from the DFE in D. melanogaster and S. cerevisiae. Architectural origination grows logistically with punctuated equilibria and adaptive radiation, while fourteen traits arise independently three to five times, demonstrating convergent evolution. These regularities indicate that the statistical structure of evolution is substrate-independent and fixed by fitness-landscape topology.

What carries the argument

The distribution of fitness effects (DFE) measured from ablation experiments, tested for shape similarity to biological DFEs via normalized Kolmogorov-Smirnov statistics.

If this is right

The higher beneficial fraction in AI quantifies the advantage of directed search over blind mutation while preserving the same overall DFE form.
Architectural development in AI exhibits logistic dynamics, punctuated equilibria, and adaptive radiation into domain-specific niches.
Fourteen traits have been invented independently multiple times, mirroring biological convergent evolution.
The statistical laws of evolution apply equally whether changes occur through genetic mutation or code modification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If landscape topology alone sets the DFE, the same heavy-tailed pattern should appear in other non-biological complex systems such as large software repositories or organizational redesigns.
Biological models of mutation accumulation could be adapted to forecast the rate at which new AI architectures become viable.
Mapping the effective fitness landscapes of different AI domains might predict which architectural traits are likely to arise convergently.

Load-bearing premise

That published ablation experiments provide an unbiased sample of architectural changes and their true performance effects, free from systematic biases in which results get reported or how effects are classified.

What would settle it

A new collection of several hundred AI architectural modifications whose measured fitness effects follow an exponential rather than heavy-tailed distribution would falsify the claimed universality.

Figures

Figures reproduced from arXiv: 2604.10571 by Theodor Spiro.

**Figure 1.** Figure 1: Distribution of fitness effects in AI architectures matches biological DFEs. (A) Histogram of all AI ablation effects (n = 935) overlaid with synthetic DFE for S. cerevisiae (parametric reconstruction from published summary statistics). The primary analysis uses major ablations only (n = 568; see [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗

**Figure 2.** Figure 2: DFE stratification and universality. (A) CDF of fitness effects by mutation type: major ablations (n = 568, component removal) are more deleterious than minor ablations (n = 213, hyperparameter changes), mirroring the biological pattern of deletions vs. point mutations. (B) DFE by ML domain: CV, NLP, Audio, and other domains show statistically similar distributions. (C ) Methodological control: manually c… view at source ↗

**Figure 3.** Figure 3: Diversification dynamics. (A) Annual origination rate of AI architectures, 2012–2024. Red bars mark key innovations that triggered radiation events. (B) Cumulative diversity follows a logistic curve (R2 = 0.994, K ≈ 142), indicating approach to saturation. (C ) Domain niche-filling: CV saturates first, followed by NLP, then Audio and Multimodal—analogous to ecological succession. (D) Normalized and smoothe… view at source ↗

**Figure 4.** Figure 4: Convergent evolution and lineage maturation. (A) AI architectural convergences: 14 traits independently invented 3–5 times. Colors indicate functional category. (B) Convergence intensity: AI traits (3–5 inventions) vs. biological traits (≤ 20 inventions, excluding outliers). Mann–Whitney p = 0.035. (C ) Transformer NLP lineage: DFE narrows from Transformer (2017) through BERT (2019) to Switch Transformer (… view at source ↗

read the original abstract

We test whether artificial intelligence architectural evolution obeys the same statistical laws as biological evolution. Compiling 935 ablation experiments from 161 publications, we show that the distribution of fitness effects (DFE) of architectural modifications follows a heavy-tailed Student's t-distribution with proportions (68% deleterious, 19% neutral, 13% beneficial for major ablations, n=568) that place AI between compact viral genomes and simple eukaryotes. The DFE shape matches D. melanogaster (normalized KS=0.07) and S. cerevisiae (KS=0.09); the elevated beneficial fraction (13% vs. 1-6% in biology) quantifies the advantage of directed over blind search while preserving the distributional form. Architectural origination follows logistic dynamics (R^2=0.994) with punctuated equilibria and adaptive radiation into domain niches. Fourteen architectural traits were independently invented 3-5 times, paralleling biological convergences. These results demonstrate that the statistical structure of evolution is substrate-independent, determined by fitness landscape topology rather than the mechanism of selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pulls together 935 AI ablation results and finds DFE shapes that line up with some biological cases, but the curated source data undercuts the substrate-independence claim.

read the letter

The main thing here is the scale of the compilation: 935 ablations drawn from 161 papers, broken down into effect categories with a reported 68/19/13 split for major changes and a Student's t fit that passes KS tests against D. melanogaster and S. cerevisiae data. They also track origination timing with a logistic curve (R^2 0.994) and count 14 traits that appeared independently multiple times. That dataset and the direct distributional numbers are the concrete new piece; prior work on heavy-tailed DFEs in biology is already out there, but this specific cross-domain matching with AI numbers is not.

Referee Report

3 major / 3 minor

Summary. The manuscript compiles 935 ablation experiments from 161 AI publications to test if AI architectural evolution follows the same statistical laws as biological evolution. It reports that the distribution of fitness effects (DFE) of architectural modifications follows a heavy-tailed Student's t-distribution, with 68% deleterious, 19% neutral, and 13% beneficial for major ablations (n=568), placing AI between viral genomes and eukaryotes, and matching D. melanogaster (KS=0.07) and S. cerevisiae (KS=0.09). Architectural origination is shown to follow logistic dynamics (R²=0.994) with punctuated equilibria and adaptive radiation, and 14 traits invented independently 3-5 times. The paper concludes that evolutionary statistical structures are substrate-independent, determined by fitness landscape topology.

Significance. Should the central empirical claims be substantiated with full data transparency and bias corrections, this paper would offer a significant contribution by providing quantitative evidence for universal evolutionary patterns across biological and artificial systems. The large dataset compilation, specific distributional matches, and logistic growth modeling are strengths that could influence thinking in both evolutionary biology and AI development regarding the role of landscape topology in shaping evolutionary statistics.

major comments (3)

[Abstract] Abstract: The DFE proportions (68% deleterious, 19% neutral, 13% beneficial, n=568) and the Student's t-distribution fit are central to the claim of similarity to biological DFEs, but the manuscript provides no explicit criteria for classifying ablations as 'major' versus minor or for selecting the 161 publications, which is necessary to evaluate potential selection biases that could affect the beneficial fraction.
[Results (DFE comparison)] Results (DFE comparison): The normalized KS statistics (0.07 for D. melanogaster, 0.09 for S. cerevisiae) are presented as evidence of match, but without the raw effect size data, the fitting procedure for the t-distribution parameters, or the normalization method, it is not possible to independently verify the distributional similarity or rule out post-hoc adjustments.
[Discussion] Discussion: The assertion that the statistical structure is 'substrate-independent' and 'determined by fitness landscape topology rather than the mechanism of selection' relies on the analogy between published AI ablations and random biological mutations; while the paper acknowledges the elevated beneficial fraction as quantifying directed search, it does not provide a quantitative assessment or correction for publication and author selection biases in the compiled dataset.

minor comments (3)

[Methods] Methods: The description of how the 935 experiments were extracted and coded from the 161 papers should include inter-rater reliability metrics or a supplementary table listing all included papers and ablations for reproducibility.
[Figure 2] Figure 2: The logistic growth fit for architectural origination would benefit from showing the raw cumulative count data points alongside the fitted curve to allow visual assessment of the R²=0.994 claim.
[References] References: Additional citations to foundational work on distribution of fitness effects in evolutionary biology (e.g., on t-distributions in DFEs) would strengthen the comparative analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have strengthened the manuscript's transparency and addressed potential concerns about reproducibility and bias. We have revised the paper accordingly, adding explicit criteria, raw data, fitting details, and a quantitative bias assessment. Our point-by-point responses to the major comments are below.

read point-by-point responses

Referee: [Abstract] Abstract: The DFE proportions (68% deleterious, 19% neutral, 13% beneficial, n=568) and the Student's t-distribution fit are central to the claim of similarity to biological DFEs, but the manuscript provides no explicit criteria for classifying ablations as 'major' versus minor or for selecting the 161 publications, which is necessary to evaluate potential selection biases that could affect the beneficial fraction.

Authors: We agree that explicit criteria are required for reproducibility and bias evaluation. In the revised manuscript, we have added a Methods section specifying publication selection (peer-reviewed papers with quantitative ablation results on standard benchmarks like ImageNet or GLUE, with before/after metrics reported) and 'major' ablation classification (core component removal/alteration affecting >5% of parameters or >2% primary metric change). A sensitivity analysis in the supplement confirms DFE proportions vary by <2% across threshold choices. We also discuss selection biases in the revised Discussion, noting the elevated beneficial fraction partly reflects directed search while the heavy-tailed shape holds in subsets excluding high-impact papers. revision: yes
Referee: [Results (DFE comparison)] Results (DFE comparison): The normalized KS statistics (0.07 for D. melanogaster, 0.09 for S. cerevisiae) are presented as evidence of match, but without the raw effect size data, the fitting procedure for the t-distribution parameters, or the normalization method, it is not possible to independently verify the distributional similarity or rule out post-hoc adjustments.

Authors: We have provided the full raw effect size data as Supplementary Table S1 (all 935 ablations with normalized performance deltas). The t-distribution fitting uses maximum likelihood estimation (scipy.stats.t.fit), with parameters now reported (df=2.8, loc=0.01, scale=0.15). Normalization scales log-transformed absolute effect sizes to zero mean/unit variance for cross-species comparison, as detailed in the new 'Distributional Analysis' Methods subsection. The KS tests are two-sample on normalized data; analysis code is deposited on GitHub for verification. revision: yes
Referee: [Discussion] Discussion: The assertion that the statistical structure is 'substrate-independent' and 'determined by fitness landscape topology rather than the mechanism of selection' relies on the analogy between published AI ablations and random biological mutations; while the paper acknowledges the elevated beneficial fraction as quantifying directed search, it does not provide a quantitative assessment or correction for publication and author selection biases in the compiled dataset.

Authors: We have expanded the Discussion with a simulation-based bias correction assuming 40% of deleterious results are unpublished (conservative estimate from AI literature). This adjusts the beneficial fraction to ~9% while preserving the t-distribution superiority (AIC delta >50) and KS similarity (changes to 0.08/0.11). We explicitly list incomplete bias correction as a limitation due to unavailable negative results and frame the substrate-independence claim as a supported hypothesis tied to landscape topology, not an absolute assertion. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical compilation and external comparison

full rationale

The paper compiles ablation data from 161 external publications (n=935 experiments), fits the resulting DFE to a Student's t-distribution, reports proportions (68% deleterious etc.), and compares the shape (KS distances) and beneficial fraction directly to independent biological benchmarks from D. melanogaster and S. cerevisiae. No quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing step reduces to a self-citation or internal ansatz. The central claim of substrate-independence rests on the external match rather than any self-referential construction, making the derivation self-contained against outside data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on treating published AI ablations as evolutionary events and on the statistical equivalence of fitted distributions across domains.

free parameters (2)

Student's t-distribution parameters
Shape and scale parameters fitted to the observed distribution of fitness effects from the 935 AI ablations.
Proportions of effect categories
68% deleterious, 19% neutral, 13% beneficial derived from classification of 568 major ablations.

axioms (2)

domain assumption Ablation experiments in AI publications represent evolutionary modifications analogous to genetic mutations
The paper equates removal or alteration of architectural components to mutational changes in fitness.
domain assumption The dataset compiled from 161 publications is representative and free of systematic publication bias
Assumes the selected experiments accurately reflect the full space of architectural changes.

pith-pipeline@v0.9.0 · 5482 in / 1621 out tokens · 64455 ms · 2026-05-10T16:00:53.829801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Bank C, Hietpas RT, Jensen JD, Bolon DNA (2014) A systematic survey of an intragenic epistatic landscape.Mol Biol Evol32:229–238

2014
[2]

(2019) Inferring the distri- bution of fitness effects of spontaneous muta- tions inChlamydomonas reinhardtii.PLoS Biol 17:e3000192

Böndel KB et al. (2019) Inferring the distri- bution of fitness effects of spontaneous muta- tions inChlamydomonas reinhardtii.PLoS Biol 17:e3000192

2019
[3]

Genetics167:559–567

Burch CL, Chao L (2003) Epistasis and its re- lationship to canalization in the RNA virusφ6. Genetics167:559–567

2003
[4]

Chen Z-Q, Benton MJ (2012) The timing and pattern of biotic recovery following the end- Permianmassextinction.Nat Geosci5:375–383

2012
[5]

Elsken T, Metzen JH, Hutter F (2019) Neural architecture search: a survey.J Mach Learn Res 20:1–21

2019
[6]

England JL (2013) Statistical physics of self- replication.J Chem Phys139:121923

2013
[7]

Eyre-Walker A, Woolfit M, Phelps T (2006) The distribution of fitness effects of new deleteri- ous amino acid mutations in humans.Genetics 173:891–900

2006
[8]

Eyre-Walker A, Keightley PD (2007) The distri- bution of fitness effects of new mutations.Nat Rev Genet8:610–618

2007
[9]

(2009) Evidence for pervasive adaptive protein evolution in wild mice.PLoS Genet6:e1000825

Halligan DL et al. (2009) Evidence for pervasive adaptive protein evolution in wild mice.PLoS Genet6:e1000825

2009
[10]

Training Compute-Optimal Large Language Models

Hoffmann J et al. (2022) Training compute-optimal large language models. arXiv:2203.15556

work page internal anchor Pith review Pith/arXiv arXiv 2022
[11]

Scaling Laws for Neural Language Models

Kaplan J et al. (2020) Scaling laws for neural language models.arXiv:2001.08361

work page internal anchor Pith review Pith/arXiv arXiv 2020
[12]

Keightley PD, Eyre-Walker A (2007) Joint in- ference of the distribution of fitness effects of deleterious mutations and population demogra- phy.Genetics177:2251–2261

2007
[13]

Loewe L, Charlesworth B (2006) Inferring the distribution of mutational effects on fitness in Drosophila.Biol Lett2:426–430

2006
[14]

McGheeGR(2011)Convergent Evolution: Lim- ited Forms Most Beautiful(MIT Press)

2011
[15]

Sanjuán R, Moya A, Elena SF (2004) The dis- tribution of fitness effects caused by single- nucleotide substitutions in an RNA virus.Proc Natl Acad Sci USA101:8396–8401

2004
[16]

Sepkoski JJ (1984) A kinetic model of Phanero- zoic taxonomic diversity. III. Post-Paleozoic families and mass extinctions.Paleobiology 10:246–267

1984
[17]

Andrews JH, Briand LC, Labiche Y (2005) Is mutation an adequate criterion of testing effec- tiveness?Proc 27th Int Conf on Software Engi- neering, pp. 215–224

2005
[18]

Spiro T (2025) The oracle’s fingerprint: cor- related AI forecasting errors and the limits of bias transmission.Preprint available athttps: //arxiv.org/abs/XXXX.XXXXX

2025
[19]

Stanley KO, Clune J (2019) Designing neural networks through neuroevolution.Nat Mach In- tell1:24–35

2019
[20]

Evol Theory1:1–30

Van Valen L (1973) A new evolutionary law. Evol Theory1:1–30

1973
[21]

(2017) Attention is all you need.Advances in Neural Information Process- ing Systems30

Vaswani A et al. (2017) Attention is all you need.Advances in Neural Information Process- ing Systems30

2017
[22]

9 1.0 0.8 0.6 0.4 0.2 0.0 0.2 Relative fitness effect ( ) 0 5 10 15 20 25 30 35Probability density A DFE: AI major ablations vs S

Wloch DM, Szafraniec K, Borts RH, Korona R (2001) Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae.Genetics159:441– 452. 9 1.0 0.8 0.6 0.4 0.2 0.0 0.2 Relative fitness effect ( ) 0 5 10 15 20 25 30 35Probability density A DFE: AI major ablations vs S. cerevisiae All ablations (n=935) Major ...

2001