pith. machine review for the scientific record. sign in

citation dossier

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Saxe, Andrew M · 2013 · arXiv 1312.6120

17Pith papers citing it
18reference links
cs.LGtop field · 12 papers
UNVERDICTEDtop verdict bucket · 12 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 17 reviewed papers. Its strongest current cluster is cs.LG (12 papers). The largest review-status bucket among citing papers is UNVERDICTED (12 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

years

2026 15 2015 2

representative citing papers

Deep Residual Learning for Image Recognition

cs.CV · 2015-12-10 · accept · novelty 8.0

Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

How Much is Brain Data Worth for Machine Learning?

cs.AI · 2026-05-10 · conditional · novelty 7.0

Brain data is worth a variable number of task samples depending on task-brain alignment, noise levels, and latent dimension, with conditions under which it also improves robustness to test distribution shift.

Learning reveals invisible structure in low-rank RNNs

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.

A Theory of Saddle Escape in Deep Nonlinear Networks

cs.LG · 2026-05-02 · conditional · novelty 7.0 · 2 refs

An exact norm-imbalance identity classifies activations into four classes and reduces deep nonlinear training flow to a scalar ODE that predicts saddle escape time scaling as ε to the power of minus (r-2) for r bottleneck layers.

Dimensional Criticality at Grokking Across MLPs and Transformers

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

Effective cascade dimension D(t) crosses D=1 at the grokking transition in MLPs and Transformers, with opposite directions for modular addition versus XOR, consistent with attraction to a shared critical manifold.

Grokking as Dimensional Phase Transition in Neural Networks

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

Grokking occurs as the effective dimensionality of the gradient field transitions from sub-diffusive to super-diffusive at the onset of generalization, exhibiting self-organized criticality.

citing papers explorer

Showing 17 of 17 citing papers.