Let us denote α⋆ = lim d→∞ αc(k)(58) where we recall that αc(k) is the threshold of α for the feature k to be learnable

We have v2 i = i−2β Pk j=1 j−2β ≃(1−2β)i −2βk2β−1 (56) From (48), straightforward calculations yield lim d→∞ kc/k= (1−2β)α λσγ(∆ +ε) 1 2β (57) In this case it is possible for all f

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

stat.ML · 2026-05-11 · unverdicted · novelty 7.0

In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).

citing papers explorer

Showing 1 of 1 citing paper.

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks stat.ML · 2026-05-11 · unverdicted · none · ref 52
In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).

Let us denote α⋆ = lim d→∞ αc(k)(58) where we recall that αc(k) is the threshold of α for the feature k to be learnable

fields

years

verdicts

representative citing papers

citing papers explorer