Recognition: no theorem link
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
Pith reviewed 2026-05-12 00:53 UTC · model grok-4.3
The pith
Similar features repel during grokking via negative interactions in the derived matrix B, but this repulsion creates a detectable rank-2 spectral signature in weight updates only under quadratic activations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the two-layer modular addition setup, the sign rule from the repulsion theorem holds robustly on the top-200 most similar feature pairs, with sign-match rates rising to 0.985 for x squared and reaching 1.000 for ReLU. Despite this shared sign structure in B, the rolling eigengap detector on the ratio of the second to third singular values of Delta W fires in all grokking runs only for x squared at a consistent epoch with large magnitude separation, while it never fires for ReLU and the spectrum stays rank-1. This dissociation matches the distinction between focused power-law memorization and spreading memorization that depends on the activation derivative.
What carries the argument
The matrix B = (F̃^T F̃ + η I)^{-1} whose off-diagonals encode repulsion for similar features, together with the activation derivative σ' that determines how repulsion appears in weight updates.
If this is right
- Feature repulsion is a general part of interactive learning in this grokking setup independent of activation choice.
- Spectral signatures in weight updates are not a universal marker of the underlying repulsion mechanism.
- Grokking trajectories can differ in their observable dynamics even when the feature-interaction signs are the same.
- The distinction between focused and spreading memorization follows directly from how the activation derivative modulates repulsion.
Where Pith is reading between the lines
- Monitoring only weight spectra may miss repulsion-driven processes in some activations, suggesting the need for direct feature-pair probes.
- The same dissociation could appear in other algorithmic tasks where grokking occurs, offering a way to classify memorization styles.
- If the pattern generalizes, activation choice becomes a controllable lever for making internal repulsion mechanisms visible or hidden.
Load-bearing premise
The modular addition task and the eigengap detector on Delta W capture when feature repulsion becomes observable in weight structure without post-hoc adjustment.
What would settle it
Observing either the sign rule failing on the top-200 similar feature pairs in new runs or the spectral detector firing at comparable rates and epochs for both quadratic and ReLU activations.
Figures
read the original abstract
Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix $ B = (\widetilde{F}^\top \widetilde{F} + \eta I)^{-1} $ during the interactive feature-learning stage of grokking: similar features have negative off-diagonal entries $ B_{j\ell} $, producing an effective repulsive force that drives them apart. However, the theorem does not specify when this mechanism becomes empirically observable, nor whether it leaves a measurable spectral signature in the parameter updates. We test this directly on Tian's modular addition setup ($ M = 71 $, $ K = 2048 $, MSE loss) and observe a clear structure-mechanism dissociation. The predicted sign rule holds robustly on the top-200 most-similar feature pairs across activations (empirical sign-match rising from 0.865 to 0.985 on $ \sigma = x^2 $ across 5 seeds, and saturating at 1.000 on $ \sigma = \operatorname{ReLU} $). However, the spectral signature in the parameter updates is strongly activation-dependent. With $ \sigma = x^2 $, a simple slope detector on the rolling eigengap $ \sigma_2 / \sigma_3 $ of $ \Delta W $ fires in 15/15 grokking seeds at epoch 174 (IQR [173,174]) and in 0/15 non-grokking controls, with 229$ \times $ late-stage magnitude separation; the spectrum is rank-2. In contrast, with $ \sigma = \operatorname{ReLU} $, the detector never fires and the spectrum remains effectively rank-1. This dissociation aligns with Tian's Theorem 5 distinction between focused (power-law) and spreading (ReLU) memorization: while the sign structure of $ B $ depends only on $ \widetilde{F}^\top \widetilde{F} $, how feature repulsion translates into weight updates critically depends on the activation derivative $ \sigma' $.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper empirically tests Tian's Theorem 6 on feature repulsion in two-layer networks during grokking on the modular addition task (M=71, K=2048, MSE loss). It finds that the predicted negative off-diagonal structure in B holds robustly across activations for the top-200 most-similar feature pairs, with sign-match rates increasing to 0.985 (x²) and saturating at 1.0 (ReLU). However, the spectral signature in parameter updates—measured via a rolling eigengap detector on σ₂/σ₃ of ΔW—is activation-dependent: the detector fires reliably (15/15 seeds at epoch 174) with rank-2 spectrum for x² but never for ReLU (rank-1 spectrum), aligning with Theorem 5's focused vs. spreading memorization distinction.
Significance. If the dissociation holds after validation, the work supplies a concrete empirical test of the repulsion mechanism and clarifies when it becomes observable in weight-update geometry, crediting the use of multiple seeds, explicit IQR reporting, and held-out control runs that cleanly separate grokking from non-grokking behavior. These elements strengthen the direct, non-circular empirical measurements.
major comments (2)
- [Spectral signature experiments (results following sign-rule measurements)] The activation-dependent spectral claim rests on the 'simple slope detector' applied to the rolling eigengap σ₂/σ₃ of ΔW. No sensitivity analysis is provided for rolling-window length, slope threshold, or SVD approximation used to obtain the singular values; because the detector is described as tuned to the late-stage jump observed only in the x² case, the consistent non-firing under ReLU could be an artifact of detector parameterization rather than an intrinsic difference in how σ' modulates repulsion into observable update structure.
- [Feature-pair sign analysis] The selection procedure for the top-200 most-similar feature pairs is not fully specified (e.g., whether similarity is computed from F̃ᵀF̃, whether the ranking is fixed across seeds or recomputed per seed, and how ties or numerical stability are handled). This detail is load-bearing for the reported sign-match percentages (0.865→0.985 for x²) and the claim that the sign rule 'holds robustly.'
minor comments (2)
- [Abstract] The abstract reports sign-match percentages without error bars, standard deviations, or IQR across the 5 seeds; adding these would improve transparency even though the body uses IQR for the detector timing.
- [Methods / Experimental setup] Notation for the activation derivative σ' and the precise definition of the rolling eigengap ratio should be introduced earlier or cross-referenced to the methods to aid readers unfamiliar with the detector implementation.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The two major comments identify areas where additional rigor and clarity will strengthen the manuscript. We address each point below and will revise accordingly.
read point-by-point responses
-
Referee: [Spectral signature experiments (results following sign-rule measurements)] The activation-dependent spectral claim rests on the 'simple slope detector' applied to the rolling eigengap σ₂/σ₃ of ΔW. No sensitivity analysis is provided for rolling-window length, slope threshold, or SVD approximation used to obtain the singular values; because the detector is described as tuned to the late-stage jump observed only in the x² case, the consistent non-firing under ReLU could be an artifact of detector parameterization rather than an intrinsic difference in how σ' modulates repulsion into observable update structure.
Authors: We agree that a sensitivity analysis is necessary to rule out parameterization artifacts. In the revised manuscript we will add an appendix containing systematic sweeps over rolling-window lengths (5, 10, 20 epochs), slope thresholds (±0.01 to ±0.05), and SVD methods (full SVD versus randomized SVD with oversampling). Across these ranges the detector continues to fire reliably (≥14/15 seeds) at epoch 174 for σ = x² while remaining silent for all ReLU runs, preserving the reported activation-dependent dissociation. We will also report the exact default parameters used in the main text and the IQR of detection epochs under each variant. revision: yes
-
Referee: [Feature-pair sign analysis] The selection procedure for the top-200 most-similar feature pairs is not fully specified (e.g., whether similarity is computed from F̃ᵀF̃, whether the ranking is fixed across seeds or recomputed per seed, and how ties or numerical stability are handled). This detail is load-bearing for the reported sign-match percentages (0.865→0.985 for x²) and the claim that the sign rule 'holds robustly.'
Authors: We thank the referee for highlighting this omission. Similarity is defined via the Gram matrix F̃ᵀF̃ (cosine similarity of feature vectors). For each seed we select the 200 pairs with the largest off-diagonal entries of this Gram matrix at the epoch immediately preceding the grokking transition (epoch 150 for the reported runs). Rankings are recomputed independently per seed; ties are broken by lexicographic index order. All computations use double precision with a 1e-12 absolute tolerance for numerical stability. We will insert a precise description of this procedure into the Methods section and will recompute the sign-match statistics under the clarified protocol to confirm the reported values (0.865→0.985 for x², saturation at 1.0 for ReLU). revision: yes
Circularity Check
No circularity: direct empirical counts on held-out seeds test external theorem
full rationale
The paper performs direct empirical measurements on held-out seeds and reports raw counts (15/15 vs 0/15 detector firings; sign-match rates rising from 0.865 to 0.985) without defining any quantity in terms of a fitted parameter that is then called a prediction. The central claims rely on observable statistics from the modular-addition setup and comparison to Tian's external Theorems 5 and 6, with no self-citation load-bearing the result, no ansatz smuggled via citation, and no renaming of known results. The rolling eigengap detector is presented as a simple operationalization whose outcomes remain falsifiable counts rather than tautological by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The modular addition task with M=71 and K=2048 reproduces the interactive feature-learning stage analyzed in Tian (2025).
Reference graph
Works this paper leans on
-
[1]
Grokking as the transition from lazy to rich training dynamics.arXiv preprint arXiv:2310.06110,
Tanishq Kumar, Blake Bordelon, Samuel J Gershman, and Cengiz Pehlevan. Grokking as the transition from lazy to rich training dynamics.arXiv preprint arXiv:2310.06110,
-
[2]
Omnigrok: Grokking beyond algorithmic data.arXiv preprint arXiv:2210.01117,
Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J Michaud, Max Tegmark, and Mike Williams. Omnigrok: Grokking beyond algorithmic data.arXiv preprint arXiv:2210.01117,
-
[3]
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability.arXiv preprint arXiv:2301.05217,
-
[4]
Grokking: Generalization beyond overfitting on small algorithmic datasets
Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets. InICLR 2022 Workshop on MATH-AI,
work page 2022
-
[5]
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
URLhttps://arxiv.org/abs/2201.02177. Yuandong Tian. Provable scaling laws of feature emergence from learning dynamics of grokking. arXiv preprint arXiv:2509.21519,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
URLhttps://arxiv.org/abs/2509.21519. Yongzhong Xu. Low-dimensional and transversely curved optimization dynamics in grokking.arXiv preprint arXiv:2602.16746,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.