pith. machine review for the scientific record. sign in

arxiv: 2605.12394 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory

Charles H Martin, Hari K. Prakash

Pith reviewed 2026-05-13 05:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords overfitting detectionrandom matrix theorygrokkingneural networksspectral distributioncorrelation trapsgeneralizationanti-grokking
0
0 comments X

The pith

Overfitting during long-horizon grokking appears as growing Correlation Traps in layer weights that random matrix theory can detect without any data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that neural networks enter an anti-grokking phase after initial generalization, where train accuracy remains high but test accuracy declines because of accumulating correlations in the weights. It detects these changes by randomizing each weight matrix element by element, fitting the resulting eigenvalue distribution to the Marchenko-Pastur law, and flagging large spectral outliers as Correlation Traps. The number and size of these traps increase precisely when generalization worsens. A secondary check passes random inputs through the model and measures Jensen-Shannon divergence on the output logits to separate traps that are merely present from those that actively harm performance. The same spectral outliers appear in some large language models, suggesting the phenomenon is not limited to small networks.

Core claim

During the anti-grokking phase of long-horizon grokking—where training accuracy stays high while test accuracy falls—element-wise randomization of each layer weight matrix produces empirical spectral distributions that develop large outliers relative to a Marchenko-Pastur fit; these outliers, termed Correlation Traps, increase in number and magnitude as generalization degrades. The traps are structurally absent in the earlier pre-grokking phase. Passing random data through the trained model and computing Jensen-Shannon divergence between output logit distributions distinguishes benign traps from those that degrade test performance. Foundation-scale LLMs exhibit analogous outliers.

What carries the argument

Correlation Traps, defined as large outliers in the empirical spectral distribution of an element-wise randomized weight matrix that violate the self-averaging property of a Marchenko-Pastur fit, which track the buildup of correlations responsible for overfitting.

If this is right

  • Anti-grokking constitutes a distinct training phase separate from pre-grokking, identifiable solely from weight spectra.
  • Overfitting can be monitored layer by layer during training without ever accessing train or test examples.
  • Jensen-Shannon divergence on random inputs provides an empirical test to decide whether detected traps are harmful.
  • The same spectral signature appears in some foundation-scale LLMs, indicating the mechanism may operate at large scale.
  • Structural absence of traps before grokking and their growth afterward distinguishes benign from harmful weight evolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Periodic checks for trap formation could be inserted into training loops to trigger early stopping or regularization before test accuracy collapses.
  • If traps prove causal, weight-update rules that penalize large outliers in the randomized spectrum might shorten or eliminate the anti-grokking interval.
  • The method offers a route to compare generalization dynamics across architectures by counting trap statistics rather than relying on held-out data.
  • Detecting traps in already-deployed models might flag latent overfitting even when current test sets no longer reveal it.

Load-bearing premise

Large spectral outliers relative to a Marchenko-Pastur fit on randomized weights reliably mark the formation of correlations that cause overfitting rather than arising for unrelated reasons.

What would settle it

Train a network through the anti-grokking regime, confirm that test accuracy has declined, yet find that the randomized weight spectra show no outliers above the Marchenko-Pastur bulk edge.

Figures

Figures reproduced from arXiv: 2605.12394 by Charles H Martin, Hari K. Prakash.

Figure 1
Figure 1. Figure 1: Grokking dynamics and Correlation Traps. Reproduction of known grokking experi￾ments, extended to long-horizon training. Train accuracy (red), test accuracy (purple), and Correlation Traps (blue) are shown for (a) an MLP on MNIST, (b) Modular Addition (MA), and (c) GPT2. Shaded regions denote pre-grokking (gray), grokking (yellow), and anti-grokking (green). Correlation Traps arise at the onset of late-sta… view at source ↗
Figure 2
Figure 2. Figure 2: Randomized ESDs, MP fits, and emergence of Correlation Traps. (a) Example of the ESD of Xrand compared with a Marchenko–Pastur (MP) fit. In a self-averaging randomized layer, the spectrum is described by the MP bulk and no large right-edge outliers remain. (b,c) Examples of trapped randomized spectra from Layer 2 of the MLP. Right before collapse, the randomized spectrum already shows a dominant spike near… view at source ↗
Figure 3
Figure 3. Figure 3: Grokking dynamics and test loss. Train accuracy (red), test accuracy (purple), and test loss (blue) are shown for (a) an MLP on MNIST and (b) Modular Addition (MA) and (c) GPT2. Trap count separates this regime from the earlier phases. During pre-grokking, when the model has already fit the training set but has not yet generalized, the average number of detected traps is effectively zero. During grokking, … view at source ↗
Figure 4
Figure 4. Figure 4: JSD diagnostic ablation test for selected MLP, MA, and GPT2 anti-grokking check￾points. Trap removal score Jk(T = 1) vs. delta test error for individual traps. Benign traps have Jk(T = 1) ≈ 0 while harmful traps have relatively larger Jk(T = 1) scores. 7 Broader implications for frontier-scale models To motivate the broader relevance of Correlation Traps, we applied the same shuffled-spectrum diag￾nostic t… view at source ↗
Figure 5
Figure 5. Figure 5: Layer-wise Correlation Trap profiles in frontier-scale open-weight LLMs. Number of detected Correlation Traps per layer for the OpenAI gpt-oss-20b and gpt-oss-120b models. Taken together, these observations suggest that Correlation Traps may provide a practical spectral diagnostic for identifying potentially harmful overfitting structures in frontier-scale foundation models. More broadly, the large number … view at source ↗
Figure 6
Figure 6. Figure 6: Evidence for norm-based prototype collapse in the anti-grokking MLP. (a) The original [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prototype-like directions associated with trapped layers. Largest right singular vector v (1) of W1 in pixel space. The corresponding direction evolves from unstructured noise in pre￾grokking, to a smooth global template during grokking, to a localized prototype-like image in anti-grokking. By mapping the trap back to the FC1 layer, one can observe the overfit sector W1 [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗
Figure 8
Figure 8. Figure 8: Trap intervention in the anti-grokking phase. (a) Confusion matrix for the full model in anti-grokking. (b) Confusion matrix after replacing the FC1 trap direction with a matched Gaussian random vector. The strong bias toward one prototype class (i.e., 5) is substantially reduced. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Rows of W1 selected by leading-vector localization. The selected receptive fields are noise-like in pre-grokking, remain diffuse around peak generalization, and become recognizable digit templates in anti-grokking. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Weight distributions with extreme coordinates across checkpoints. Anti-grokking introduces structural outliers into the weight values of all three layers. These extreme coordinates are consistent with the shuffled-spectrum outliers counted as Correlation Traps. C MLP Perturbation Study Do Traps Simply Reflect the Scale of the Weights? As an additional robustness check, we want to eliminate the possibility… view at source ↗
read the original abstract

Training Neural Networks (NNs) without overfitting is difficult; detecting that overfitting is difficult as well. We present a novel Random Matrix Theory method that detects the onset of overfitting in deep learning models without access to train or test data. For each model layer, we randomize each weight matrix element-wise, $\mathbf{W} \to \mathbf{W}_{\mathrm{rand}}$, fit the randomized empirical spectral distribution with a Marchenko-Pastur distribution, and identify large outliers that violate self-averaging. We call these outliers Correlation Traps. During the onset of overfitting, which we call the "anti-grokking" phase in long-horizon grokking, Correlation Traps form and grow in number and scale as test accuracy decreases while train accuracy remains high. Traps may be benign or may harm generalization; we provide an empirical approach to distinguish between them by passing random data through the trained model and evaluating the JS divergence of output logits. Our findings show that anti-grokking is an additional grokking phase with high train accuracy and decreasing test accuracy, structurally distinct from pre-grokking through its Correlation Traps. More broadly, we find that some foundation-scale LLMs exhibit the same Correlation Traps, indicating potentially harmful overfitting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The paper proposes a Random Matrix Theory method to detect overfitting in neural networks during long-horizon grokking without access to train or test data. For each layer, weights are randomized element-wise to W_rand, the empirical spectral distribution is fit to a Marchenko-Pastur law, and large outliers violating self-averaging are identified as 'Correlation Traps.' These traps are claimed to form and grow during an 'anti-grokking' phase (high train accuracy, falling test accuracy), with JS divergence on random inputs used to label them benign or harmful; similar traps are reported in some foundation-scale LLMs.

Significance. If the empirical claims are substantiated, the work could supply a data-free diagnostic for overfitting and a new lens on grokking phases via spectral outliers. The RMT framing and the proposed distinction between benign/harmful traps are conceptually interesting, but the absence of quantitative validation, baselines, or causal tests limits immediate impact.

major comments (4)
  1. [Abstract and Experimental Results] The abstract and method description supply no quantitative results, baselines, error bars, or validation experiments, rendering it impossible to judge whether the observed growth of outliers actually tracks the anti-grokking phase or supports the central claim.
  2. [Method (randomization and MP fitting procedure)] The Marchenko-Pastur fit is performed on the same randomized matrices whose outliers are then declared 'Correlation Traps'; without explicit details on parameter estimation (variance, aspect ratio) or threshold choice, the diagnostic risks circularity and being partly defined by the data it diagnoses.
  3. [Results and Discussion] The manuscript reports that outliers appear in tandem with the accuracy curves and that JS divergence on random inputs can label traps, yet no intervention (e.g., targeted regularization of the outlier subspace) is shown to alter the test-accuracy trajectory, leaving the causal status of the traps unestablished.
  4. [JS divergence classification subsection] The claim that JS divergence on unstructured random inputs reliably distinguishes harmful from benign traps assumes that such inputs expose the relevant feature correlations; this is plausible but unproven for the structured data on which generalization actually fails.
minor comments (2)
  1. [Method] Notation for W_rand and the precise definition of 'large outliers' should be formalized with an equation or algorithm box for reproducibility.
  2. [LLM experiments] The extension to LLMs would benefit from specifying model names, layer indices, and quantitative outlier counts rather than the qualitative statement 'some foundation-scale LLMs exhibit the same Correlation Traps.'

Simulated Author's Rebuttal

4 responses · 1 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below, clarifying aspects of the work and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract and Experimental Results] The abstract and method description supply no quantitative results, baselines, error bars, or validation experiments, rendering it impossible to judge whether the observed growth of outliers actually tracks the anti-grokking phase or supports the central claim.

    Authors: We acknowledge that the abstract is high-level and omits specific numbers. The full manuscript contains figures and descriptions documenting the growth of Correlation Traps alongside accuracy curves across multiple models and runs. In revision we will expand the abstract with key quantitative observations (e.g., average increase in outlier count and magnitude during anti-grokking), add error bars from repeated trials to the experimental figures, and include a brief baseline comparison against standard overfitting indicators. revision: yes

  2. Referee: [Method (randomization and MP fitting procedure)] The Marchenko-Pastur fit is performed on the same randomized matrices whose outliers are then declared 'Correlation Traps'; without explicit details on parameter estimation (variance, aspect ratio) or threshold choice, the diagnostic risks circularity and being partly defined by the data it diagnoses.

    Authors: We appreciate the referee noting the risk of circularity. The procedure randomizes weights element-wise to obtain a reference spectrum that is fitted to the Marchenko-Pastur law (estimating aspect ratio q and variance σ² from the randomized matrix), after which outliers are identified in the spectrum of the original weight matrix as deviations beyond the fitted bulk edge. This separation prevents circularity. We will revise the method section to provide explicit formulas for parameter estimation and the precise outlier threshold rule (e.g., MP edge plus a fixed multiple of the bulk standard deviation). revision: yes

  3. Referee: [Results and Discussion] The manuscript reports that outliers appear in tandem with the accuracy curves and that JS divergence on random inputs can label traps, yet no intervention (e.g., targeted regularization of the outlier subspace) is shown to alter the test-accuracy trajectory, leaving the causal status of the traps unestablished.

    Authors: The manuscript presents the alignment between trap emergence and accuracy degradation as an observational diagnostic signal rather than a causal claim. We agree that intervention experiments would be needed to establish causality and will add an explicit limitations paragraph in the discussion stating the correlational nature of the current evidence while outlining targeted regularization of the outlier subspace as a concrete direction for future work. revision: partial

  4. Referee: [JS divergence classification subsection] The claim that JS divergence on unstructured random inputs reliably distinguishes harmful from benign traps assumes that such inputs expose the relevant feature correlations; this is plausible but unproven for the structured data on which generalization actually fails.

    Authors: This is a valid point about the proxy. Random inputs are used to obtain a data-free probe of output statistics. We will revise the subsection to include a clearer justification of the choice, discuss its limitations as a proxy for structured data, and add a short sensitivity analysis using mildly perturbed structured inputs where feasible. revision: yes

standing simulated objections not resolved
  • Establishing causality via intervention experiments (e.g., regularizing the outlier subspace and measuring resulting test-accuracy changes) would require new empirical work outside the scope of the present submission.

Circularity Check

0 steps flagged

No significant circularity in the derivation or detection method.

full rationale

The paper's core procedure randomizes weight matrices element-wise to obtain W_rand, fits a Marchenko-Pastur law to its empirical spectral distribution, and flags large outliers relative to that fit as Correlation Traps. These outliers are then observed to appear and grow in tandem with the anti-grokking phase (high train accuracy, falling test accuracy). This is an empirical correlation between an RMT-derived statistic on the weights and separately measured accuracy curves; the definition of the traps is not constructed from the accuracy labels, nor is any claimed prediction equivalent to the fitted parameters by construction. The JS-divergence test on random inputs is likewise an independent empirical probe. No load-bearing self-citation, uniqueness theorem, or ansatz imported from prior author work appears in the provided text, and the method remains self-contained against external RMT benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The approach rests on standard random-matrix assumptions for shuffled weights and introduces new diagnostic entities whose interpretation as overfitting signals lacks independent evidence outside the paper.

free parameters (1)
  • Marchenko-Pastur fit parameters (variance, aspect ratio)
    Estimated from the empirical spectral distribution of each randomized weight matrix to define the bulk edge and identify outliers.
axioms (1)
  • domain assumption Element-wise randomized weight matrices obey the Marchenko-Pastur law
    Standard RMT assumption invoked to establish the expected bulk spectrum against which trained-matrix outliers are measured.
invented entities (2)
  • Correlation Traps no independent evidence
    purpose: Spectral outliers indicating harmful learned correlations during overfitting
    Defined as large deviations from the MP bulk in trained weights relative to randomized versions.
  • anti-grokking phase no independent evidence
    purpose: Late training regime with high train accuracy and falling test accuracy
    Observationally defined during long-horizon grokking experiments.

pith-pipeline@v0.9.0 · 5521 in / 1415 out tokens · 93586 ms · 2026-05-13T05:20:19.634735+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Amit, Hanoch Gutfreund, and Haim Sompolinsky

    Daniel J. Amit, Hanoch Gutfreund, and Haim Sompolinsky. Spin-glass models of neural networks.Physical Review A, 32(2):1007–1018, 1985. doi: 10.1103/PhysRevA.32.1007

  2. [2]

    Anderson

    Philip W. Anderson. Absence of diffusion in certain random lattices.Physical Review, 109(5): 1492–1505, 1958. doi: 10.1103/PhysRev.109.1492

  3. [3]

    Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.The Annals of Probability, 33(5):1643–1697, 2005

    Jinho Baik, Gérard Ben Arous, and Sandrine Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.The Annals of Probability, 33(5):1643–1697, 2005

  4. [4]

    Statistical mechanics approach to early stopping and weight decay.Physical Review E, 58(1):833–847, 1998

    Siegfried Bös. Statistical mechanics approach to early stopping and weight decay.Physical Review E, 58(1):833–847, 1998. doi: 10.1103/PhysRevE.58.833

  5. [5]

    Plancks Gesetz und Lichtquantenhypothese.Zeitschrift für Physik, 26(1): 178–181, 1924

    Satyendra Nath Bose. Plancks Gesetz und Lichtquantenhypothese.Zeitschrift für Physik, 26(1): 178–181, 1924. doi: 10.1007/BF01327326

  6. [6]

    Quantentheorie des einatomigen idealen Gases.Sitzungsberichte der Preussis- chen Akademie der Wissenschaften, pages 3–14, 1925

    Albert Einstein. Quantentheorie des einatomigen idealen Gases.Sitzungsberichte der Preussis- chen Akademie der Wissenschaften, pages 3–14, 1925

  7. [7]

    E. Gardner. The space of interactions in neural network models.Journal of Physics A: Mathematical and General, 21(1):257–270, 1988. doi: 10.1088/0305-4470/21/1/030

  8. [8]

    Hopfield

    John J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982. doi: 10.1073/pnas.79.8.2554

  9. [9]

    Grokfast: Accelerated grokking by amplifying slow gradients, 2024

    Jaerin Lee, Bong Gyun Kang, Kihoon Kim, and Kyoung Mu Lee. Grokfast: Accelerated grokking by amplifying slow gradients, 2024. URL https://arxiv.org/abs/2405.20233

  10. [10]

    Risk phase transitions in spiked regression: Alignment driven benign and catastrophic overfitting.arXiv preprint arXiv:2510.01414, 2025

    Jiping Li and Rishi Sonthalia. Risk phase transitions in spiked regression: Alignment driven benign and catastrophic overfitting.arXiv preprint arXiv:2510.01414, 2025

  11. [11]

    Nolte, Eric J

    Ziming Liu, Ouail Kitouni, Niklas S. Nolte, Eric J. Michaud, Max Tegmark, and Mike Williams. Towards understanding grokking: An effective theory of representation learning. InAdvances in Neural Information Processing Systems, volume 35, pages 34651–34663. Curran Associates, Inc., 2022. URLhttps://arxiv.org/abs/2205.10343

  12. [12]

    Marchenko and Leonid Andreevich Pastur

    Vladimir A. Marchenko and Leonid Andreevich Pastur. Distribution of eigenvalues for some sets of random matrices.Matematicheskii Sbornik, 72(114)(4):507–536, 1967

  13. [13]

    Charles H. Martin. WeightWatcher: Analyze Deep Learning Models without Training or Data. https://github.com/CalculatedContent/WeightWatcher, 2018-2024. Version 0.7.5.5 used in this study. Accessed May 12, 2025

  14. [14]

    Martin and Christopher Hinrichs

    Charles H. Martin and Christopher Hinrichs. SETOL: A semi-empirical theory of (deep) learning.arXiv preprint arXiv:2507.17912, 2025. URL https://arxiv.org/abs/2507. 17912

  15. [15]

    Martin and Michael W

    Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning.Journal of Machine Learning Research, 22(1):165, January 2021. URL http://jmlr.org/papers/ v22/20-410.html. 10

  16. [16]

    Martin, Tian Peng, and Michael W

    Charles H. Martin, Tian Peng, and Michael W. Mahoney. Predicting trends in the quality of state- of-the-art neural networks without access to training or testing data.Nature Communications, 12:4122, jul 2021. doi: 10.1038/s41467-021-24025-8. URL https://doi.org/10.1038/ s41467-021-24025-8

  17. [17]

    Progress measures for grokking via mechanistic interpretability,

    Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability, 2023. URL https://arxiv.org/ abs/2301.05217

  18. [18]

    Introducing gpt-oss

    OpenAI. Introducing gpt-oss. https://openai.com/index/introducing-gpt-oss/, Au- gust 2025. Accessed: 2026-03-28

  19. [19]

    gpt-oss-120b & gpt-oss-20b model card

    OpenAI. gpt-oss-120b & gpt-oss-20b model card. https://openai.com/index/ gpt-oss-model-card/, August 2025. Accessed: 2026-03-28

  20. [20]

    C. E. Porter and R. G. Thomas. Fluctuations of nuclear reaction widths.Physical Review, 104 (2):483–491, 1956

  21. [21]

    Cambridge University Press, 2021

    Marc Potters and Jean-Philippe Bouchaud.A First Course in Random Matrix Theory: F or Physi- cists, Engineers and Data Scientists. Cambridge University Press, 2021. ISBN 9781108768900

  22. [22]

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

    Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets, 2022. URL https://arxiv. org/abs/2201.02177

  23. [23]

    Sebastian Seung, Haim Sompolinsky, and Naftali Tishby

    H. Sebastian Seung, Haim Sompolinsky, and Naftali Tishby. Statistical mechanics of learning from examples.Physical Review A, 45(8):6056–6091, 1992. doi: 10.1103/PhysRevA.45.6056

  24. [24]

    Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization

    Boshi Wang, Xiang Yue, Yu Su, and Huan Sun. Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://arxiv.org/pdf/2405.15071

  25. [25]

    train” curve for inferred facts test_inferredID No Main “test

    Pierre Weiss. L’hypothèse du champ moléculaire et la propriété ferromagnétique.Jour- nal de Physique Théorique et Appliquée, 6(1):661–690, 1907. doi: 10.1051/jphystap: 019070060066100. 11 A Experimental Setup and Additional Notes: MLP, MA, and GPT2 A.1 MLP Experimental Setup We train a Multi-Layer Perceptron (MLP) on a subset of the MNIST dataset using th...