arxiv: 2605.08119 · v1 · submitted 2026-04-28 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

Yongzhong Xu

Pith reviewed 2026-05-12 00:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords feature repulsiongrokkingspectral signaturemodular additionactivation functionstwo-layer networksweight updatesempirical verification

0 comments

The pith

Similar features repel during grokking via negative interactions in the derived matrix B, but this repulsion creates a detectable rank-2 spectral signature in weight updates only under quadratic activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a repulsion theorem that predicts negative off-diagonal entries in the matrix B for similar learned features, which should drive them apart during the interactive phase of grokking on modular addition. It finds that the predicted sign rule matches empirical feature pairs with high accuracy across both quadratic and ReLU activations. Yet the same repulsion produces a clear eigengap signature in the parameter updates only for the quadratic case, where a simple detector triggers reliably during grokking; ReLU networks show no such spectral shift and remain effectively rank-1. A reader would care because this shows that the internal structure of feature learning can be robust while its observable effects on weights depend on the activation derivative, separating the mechanism from its measurable consequences.

Core claim

In the two-layer modular addition setup, the sign rule from the repulsion theorem holds robustly on the top-200 most similar feature pairs, with sign-match rates rising to 0.985 for x squared and reaching 1.000 for ReLU. Despite this shared sign structure in B, the rolling eigengap detector on the ratio of the second to third singular values of Delta W fires in all grokking runs only for x squared at a consistent epoch with large magnitude separation, while it never fires for ReLU and the spectrum stays rank-1. This dissociation matches the distinction between focused power-law memorization and spreading memorization that depends on the activation derivative.

What carries the argument

The matrix B = (F̃^T F̃ + η I)^{-1} whose off-diagonals encode repulsion for similar features, together with the activation derivative σ' that determines how repulsion appears in weight updates.

If this is right

Feature repulsion is a general part of interactive learning in this grokking setup independent of activation choice.
Spectral signatures in weight updates are not a universal marker of the underlying repulsion mechanism.
Grokking trajectories can differ in their observable dynamics even when the feature-interaction signs are the same.
The distinction between focused and spreading memorization follows directly from how the activation derivative modulates repulsion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Monitoring only weight spectra may miss repulsion-driven processes in some activations, suggesting the need for direct feature-pair probes.
The same dissociation could appear in other algorithmic tasks where grokking occurs, offering a way to classify memorization styles.
If the pattern generalizes, activation choice becomes a controllable lever for making internal repulsion mechanisms visible or hidden.

Load-bearing premise

The modular addition task and the eigengap detector on Delta W capture when feature repulsion becomes observable in weight structure without post-hoc adjustment.

What would settle it

Observing either the sign rule failing on the top-200 similar feature pairs in new runs or the spectral detector firing at comparable rates and epochs for both quadratic and ReLU activations.

Figures

Figures reproduced from arXiv: 2605.08119 by Yongzhong Xu.

**Figure 1.** Figure 1: Cross-seed median (± std for accuracy and the level metric; IQR for the eigengap) on the headline 15-seed sweep. Top: test accuracy reproduction. Middle: the level metric ρtian rises in Stage II only in the grok condition. Bottom: σ2/σ3 on rolling ∆W Gram (log scale) saturates post-grokking only in the grok condition. N = 15 seeds per condition. 3 Theorem 6 verification across activations and seeds 3.1 Ver… view at source ↗

**Figure 2.** Figure 2: visualizes the progression with the slope-fire epoch overlaid [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Top-5 eigenvalues of the rolling ∆W Gram at three window sizes. At small W, σ3, σ4, σ5 collapse together to the noise floor while σ1, σ2 persist (rank-2). At W=30, the spectrum forms a geometric cascade. 4.6 Failure on σ = ReLU We re-ran the headline sweep with σ = ReLU (15 seeds, 800 epochs each, otherwise identical) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: σ = x 2 (blue) vs σ = ReLU (green), medians across 15 seeds each. The rank-2 lock-in detector σ2/σ3 that gives perfect specificity on σ = x 2 fails on ReLU: separation drops from 229× to 1.4×, slope-fire 0/15. The level metric ρtian fires at epoch 0 on ReLU because the ReLU initialization is already far from the lazy-regime form (Section 6). The contrast is stark: under σ = ReLU the slope detector fires in… view at source ↗

read the original abstract

Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix $ B = (\widetilde{F}^\top \widetilde{F} + \eta I)^{-1} $ during the interactive feature-learning stage of grokking: similar features have negative off-diagonal entries $ B_{j\ell} $, producing an effective repulsive force that drives them apart. However, the theorem does not specify when this mechanism becomes empirically observable, nor whether it leaves a measurable spectral signature in the parameter updates. We test this directly on Tian's modular addition setup ($ M = 71 $, $ K = 2048 $, MSE loss) and observe a clear structure-mechanism dissociation. The predicted sign rule holds robustly on the top-200 most-similar feature pairs across activations (empirical sign-match rising from 0.865 to 0.985 on $ \sigma = x^2 $ across 5 seeds, and saturating at 1.000 on $ \sigma = \operatorname{ReLU} $). However, the spectral signature in the parameter updates is strongly activation-dependent. With $ \sigma = x^2 $, a simple slope detector on the rolling eigengap $ \sigma_2 / \sigma_3 $ of $ \Delta W $ fires in 15/15 grokking seeds at epoch 174 (IQR [173,174]) and in 0/15 non-grokking controls, with 229$ \times $ late-stage magnitude separation; the spectrum is rank-2. In contrast, with $ \sigma = \operatorname{ReLU} $, the detector never fires and the spectrum remains effectively rank-1. This dissociation aligns with Tian's Theorem 5 distinction between focused (power-law) and spreading (ReLU) memorization: while the sign structure of $ B $ depends only on $ \widetilde{F}^\top \widetilde{F} $, how feature repulsion translates into weight updates critically depends on the activation derivative $ \sigma' $.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper confirms Tian's sign rule on B across activations but the claimed activation-dependent spectral lock-in rests on an unvalidated rolling eigengap detector.

read the letter

This paper tests the repulsion theorem from Tian 2025 on the modular addition task and reports that the predicted negative off-diagonals in B for similar features appear reliably for both x-squared and ReLU activations. Sign-match rates on the top-200 pairs reach 0.985 for x-squared and 1.0 for ReLU, while the spectral jump in Delta W only shows up under x-squared during grokking runs. The dissociation is the main new observation, since the original theorem did not address when the mechanism becomes visible in the updates or how sigma prime modulates it. The work stays close to the prior setup with M=71 and K=2048, uses five seeds, and reports direct counts such as 15/15 detector fires versus 0/15 in controls along with IQR and a 229x magnitude separation. That level of raw reporting gives the sign-rule part a solid footing and makes the activation split worth noting. The softer part is the detector itself. It is described as simple and tuned to catch the late jump only in the x-squared case, yet no checks on rolling-window length, slope threshold, or SVD approximation are mentioned. This leaves room for the ReLU non-detection to be partly an artifact of detector choice rather than a clean mechanistic difference. The abstract also skips error bars on the sign percentages and details on how the top pairs were chosen. For people working on mechanistic accounts of grokking in small two-layer models, the sign confirmation plus the activation boundary is a useful data point. I would send it to peer review; the core empirical test is grounded enough to merit referee attention even if the spectral claim needs tighter validation on the detector.

Referee Report

2 major / 2 minor

Summary. The paper empirically tests Tian's Theorem 6 on feature repulsion in two-layer networks during grokking on the modular addition task (M=71, K=2048, MSE loss). It finds that the predicted negative off-diagonal structure in B holds robustly across activations for the top-200 most-similar feature pairs, with sign-match rates increasing to 0.985 (x²) and saturating at 1.0 (ReLU). However, the spectral signature in parameter updates—measured via a rolling eigengap detector on σ₂/σ₃ of ΔW—is activation-dependent: the detector fires reliably (15/15 seeds at epoch 174) with rank-2 spectrum for x² but never for ReLU (rank-1 spectrum), aligning with Theorem 5's focused vs. spreading memorization distinction.

Significance. If the dissociation holds after validation, the work supplies a concrete empirical test of the repulsion mechanism and clarifies when it becomes observable in weight-update geometry, crediting the use of multiple seeds, explicit IQR reporting, and held-out control runs that cleanly separate grokking from non-grokking behavior. These elements strengthen the direct, non-circular empirical measurements.

major comments (2)

[Spectral signature experiments (results following sign-rule measurements)] The activation-dependent spectral claim rests on the 'simple slope detector' applied to the rolling eigengap σ₂/σ₃ of ΔW. No sensitivity analysis is provided for rolling-window length, slope threshold, or SVD approximation used to obtain the singular values; because the detector is described as tuned to the late-stage jump observed only in the x² case, the consistent non-firing under ReLU could be an artifact of detector parameterization rather than an intrinsic difference in how σ' modulates repulsion into observable update structure.
[Feature-pair sign analysis] The selection procedure for the top-200 most-similar feature pairs is not fully specified (e.g., whether similarity is computed from F̃ᵀF̃, whether the ranking is fixed across seeds or recomputed per seed, and how ties or numerical stability are handled). This detail is load-bearing for the reported sign-match percentages (0.865→0.985 for x²) and the claim that the sign rule 'holds robustly.'

minor comments (2)

[Abstract] The abstract reports sign-match percentages without error bars, standard deviations, or IQR across the 5 seeds; adding these would improve transparency even though the body uses IQR for the detector timing.
[Methods / Experimental setup] Notation for the activation derivative σ' and the precise definition of the rolling eigengap ratio should be introduced earlier or cross-referenced to the methods to aid readers unfamiliar with the detector implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments identify areas where additional rigor and clarity will strengthen the manuscript. We address each point below and will revise accordingly.

read point-by-point responses

Referee: [Spectral signature experiments (results following sign-rule measurements)] The activation-dependent spectral claim rests on the 'simple slope detector' applied to the rolling eigengap σ₂/σ₃ of ΔW. No sensitivity analysis is provided for rolling-window length, slope threshold, or SVD approximation used to obtain the singular values; because the detector is described as tuned to the late-stage jump observed only in the x² case, the consistent non-firing under ReLU could be an artifact of detector parameterization rather than an intrinsic difference in how σ' modulates repulsion into observable update structure.

Authors: We agree that a sensitivity analysis is necessary to rule out parameterization artifacts. In the revised manuscript we will add an appendix containing systematic sweeps over rolling-window lengths (5, 10, 20 epochs), slope thresholds (±0.01 to ±0.05), and SVD methods (full SVD versus randomized SVD with oversampling). Across these ranges the detector continues to fire reliably (≥14/15 seeds) at epoch 174 for σ = x² while remaining silent for all ReLU runs, preserving the reported activation-dependent dissociation. We will also report the exact default parameters used in the main text and the IQR of detection epochs under each variant. revision: yes
Referee: [Feature-pair sign analysis] The selection procedure for the top-200 most-similar feature pairs is not fully specified (e.g., whether similarity is computed from F̃ᵀF̃, whether the ranking is fixed across seeds or recomputed per seed, and how ties or numerical stability are handled). This detail is load-bearing for the reported sign-match percentages (0.865→0.985 for x²) and the claim that the sign rule 'holds robustly.'

Authors: We thank the referee for highlighting this omission. Similarity is defined via the Gram matrix F̃ᵀF̃ (cosine similarity of feature vectors). For each seed we select the 200 pairs with the largest off-diagonal entries of this Gram matrix at the epoch immediately preceding the grokking transition (epoch 150 for the reported runs). Rankings are recomputed independently per seed; ties are broken by lexicographic index order. All computations use double precision with a 1e-12 absolute tolerance for numerical stability. We will insert a precise description of this procedure into the Methods section and will recompute the sign-match statistics under the clarified protocol to confirm the reported values (0.865→0.985 for x², saturation at 1.0 for ReLU). revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical counts on held-out seeds test external theorem

full rationale

The paper performs direct empirical measurements on held-out seeds and reports raw counts (15/15 vs 0/15 detector firings; sign-match rates rising from 0.865 to 0.985) without defining any quantity in terms of a fitted parameter that is then called a prediction. The central claims rely on observable statistics from the modular-addition setup and comparison to Tian's external Theorems 5 and 6, with no self-citation load-bearing the result, no ansatz smuggled via citation, and no renaming of known results. The rolling eigengap detector is presented as a simple operationalization whose outcomes remain falsifiable counts rather than tautological by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of Tian's prior theorems and the assumption that the chosen modular-addition task reproduces the interactive feature-learning regime; no new free parameters are introduced or fitted.

axioms (1)

domain assumption The modular addition task with M=71 and K=2048 reproduces the interactive feature-learning stage analyzed in Tian (2025).
The experimental protocol is chosen to match the cited theorem's setup.

pith-pipeline@v0.9.0 · 5670 in / 1377 out tokens · 67644 ms · 2026-05-12T00:53:49.200898+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Grokking as the transition from lazy to rich training dynamics.arXiv preprint arXiv:2310.06110,

Tanishq Kumar, Blake Bordelon, Samuel J Gershman, and Cengiz Pehlevan. Grokking as the transition from lazy to rich training dynamics.arXiv preprint arXiv:2310.06110,

work page arXiv
[2]

Omnigrok: Grokking beyond algorithmic data.arXiv preprint arXiv:2210.01117,

Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J Michaud, Max Tegmark, and Mike Williams. Omnigrok: Grokking beyond algorithmic data.arXiv preprint arXiv:2210.01117,

work page arXiv
[3]

2023 , month = jan, journal =

Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability.arXiv preprint arXiv:2301.05217,

work page arXiv
[4]

Grokking: Generalization beyond overfitting on small algorithmic datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets. InICLR 2022 Workshop on MATH-AI,

work page 2022
[5]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

URLhttps://arxiv.org/abs/2201.02177. Yuandong Tian. Provable scaling laws of feature emergence from learning dynamics of grokking. arXiv preprint arXiv:2509.21519,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Yongzhong Xu

URLhttps://arxiv.org/abs/2509.21519. Yongzhong Xu. Low-dimensional and transversely curved optimization dynamics in grokking.arXiv preprint arXiv:2602.16746,

work page arXiv