arXiv preprint arXiv:2201.04753 , year=

arXiv:2201 · arXiv 2201.04753

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

Bayesian Inference with Shaped Deep Non-linear MLPs

math.ST · 2026-05-29 · unverdicted · novelty 5.0

In the LP/N = Θ(1) regime, Bayesian predictive posteriors for deep MLPs equal those of data-dependent kernels to first order, with a criterion identifying data processes that benefit from larger effective depth.

citing papers explorer

Showing 2 of 2 citing papers.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 154 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Bayesian Inference with Shaped Deep Non-linear MLPs math.ST · 2026-05-29 · unverdicted · none · ref 5
In the LP/N = Θ(1) regime, Bayesian predictive posteriors for deep MLPs equal those of data-dependent kernels to first order, with a criterion identifying data processes that benefit from larger effective depth.

arXiv preprint arXiv:2201.04753 , year=

fields

years

verdicts

representative citing papers

citing papers explorer