Grokking as the transition from lazy to rich training dynamics.arXiv preprint arXiv:2310.06110

· 2024 · arXiv 2310.06110

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

cs.LG · 2026-02-18 · unverdicted · novelty 8.0

Grokking reflects escape from a metastable low-dimensional regime where transverse curvature accumulates before generalization, with subspace motion necessary but curvature boost insufficient.

Learning as Observable Matrix Dynamics: Diffusive Relaxations versus Phase Transitions

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

Observable Matrix Dynamics (OMD) is a new diagnostic framework that uses random matrix theory on distance matrices to distinguish diffusive relaxations from phase-transition-like reorganizations during neural network training.

The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure

cs.LG · 2026-02-19 · unverdicted · novelty 7.0

Multi-task grokking in Transformers produces staggered generalization, low-dimensional manifolds, weight-decay phase structure, holographic solutions, and transverse redundancy.

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

cs.LG · 2026-04-28 · unverdicted · novelty 4.0

Empirical tests confirm robust feature repulsion signs but reveal activation-dependent spectral lock-in in grokking, with x^2 yielding rank-2 updates at epoch ~174 and ReLU remaining rank-1.

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

cs.LG · 2026-03-30

citing papers explorer

Showing 6 of 6 citing papers.

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking cs.LG · 2026-02-18 · unverdicted · none · ref 3
Grokking reflects escape from a metastable low-dimensional regime where transverse curvature accumulates before generalization, with subspace motion necessary but curvature boost insufficient.
Learning as Observable Matrix Dynamics: Diffusive Relaxations versus Phase Transitions cs.LG · 2026-06-29 · unverdicted · none · ref 31
Observable Matrix Dynamics (OMD) is a new diagnostic framework that uses random matrix theory on distance matrices to distinguish diffusive relaxations from phase-transition-like reorganizations during neural network training.
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure cs.LG · 2026-02-19 · unverdicted · none · ref 4
Multi-task grokking in Transformers produces staggered generalization, low-dimensional manifolds, weight-decay phase structure, holographic solutions, and transverse redundancy.
A Systematic Study of Behavioral Cloning for Scientific Data Annotation cs.HC · 2026-05-26 · unverdicted · none · ref 186
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking cs.LG · 2026-04-28 · unverdicted · none · ref 1
Empirical tests confirm robust feature repulsion signs but reveal activation-dependent spectral lock-in in grokking, with x^2 yielding rank-2 updates at epoch ~174 and ReLU remaining rank-1.
The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior cs.LG · 2026-03-30 · unreviewed · ref 11

Grokking as the transition from lazy to rich training dynamics.arXiv preprint arXiv:2310.06110

fields

years

verdicts

representative citing papers

citing papers explorer