{"paper":{"title":"Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Spectral outliers in wide neural networks evolve predictably during gradient descent, with one scaling regime producing width-independent dynamics and hyperparameter transfer.","cross_cats":["cs.AI","stat.ML"],"primary_cat":"cond-mat.dis-nn","authors_text":"Blake Bordelon, Cengiz Pehlevan, Clarissa Lauditi","submitted_at":"2026-05-08T15:28:01Z","abstract_excerpt":"We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$\\mu$P scaling and (2) deep linear networks in the proportional high-dimensional limit, where width, input dimension, and sample size diverge with fixed ratios. Our theory predicts how "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, μP yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The two-level DMFT remains accurate when spike directions stay statistically dependent on the random bulk and when the infinite-width or proportional limits faithfully represent finite practical networks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Spectral outliers in wide neural networks evolve predictably during gradient descent, with one scaling regime producing width-independent dynamics and hyperparameter transfer.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"73e5f3306ef4d0d7671f582c3dce92baa2afb26e21a47ca2f3cc49247a1abe7c"},"source":{"id":"2605.07870","kind":"arxiv","version":2},"verdict":{"id":"0ba49fc0-8255-417b-af3b-f52bd854564a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-11T02:58:35.783381Z","strongest_claim":"Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, μP yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS).","one_line_summary":"A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The two-level DMFT remains accurate when spike directions stay statistically dependent on the random bulk and when the infinite-width or proportional limits faithfully represent finite practical networks.","pith_extraction_headline":"Spectral outliers in wide neural networks evolve predictably during gradient descent, with one scaling regime producing width-independent dynamics and hyperparameter transfer."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.07870/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-20T10:02:13.707951Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-20T04:48:08.634595Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T15:31:18.538346Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T11:27:57.788373Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"d8f55df6540432d8c3b663c7855e808eb8124f4de06ed95684a69321a2f80413"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1418a09f7bc0ad14dd5e4dcd17f5bb05990a9277fd4e9b966741eeac521eaa05"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}