arXiv preprint arXiv:2502.17340 , year =

Kuzborskij, Ilja, Abbasi-Yadkori, Yasin , title = · arXiv 2502.17340

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

Deep linear network theory derives logarithmic decay for cross-entropy loss under gap-growth conditions versus polynomial closure for Schatten-regularized structural energy under late-time KL tails, separating fitting from simplification; conditional reductions extend this to ReLU MLPs with fixed ac

citing papers explorer

Showing 1 of 1 citing paper.

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction cs.LG · 2026-06-04 · unverdicted · none · ref 60
Deep linear network theory derives logarithmic decay for cross-entropy loss under gap-growth conditions versus polynomial closure for Schatten-regularized structural energy under late-time KL tails, separating fitting from simplification; conditional reductions extend this to ReLU MLPs with fixed ac

arXiv preprint arXiv:2502.17340 , year =

fields

years

verdicts

representative citing papers

citing papers explorer