Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.
arXiv preprint arXiv:2602.05846 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 2polarities
background 2representative citing papers
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).
Quadratic two-layer networks exhibit data-dependent power-law generalization scaling with distinct regimes in width and sample size, including an interpolation transition whose location depends on target spectrum.
Derives approximation rates and excess risk bounds for Frobenius norm-constrained DNNs learning sparse compositional functions on DAGs, applicable to multi-index models and binary trees while avoiding the curse of dimensionality.
citing papers explorer
-
Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning
Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.
-
Learn from your own latents and not from tokens: A sample-complexity theory
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
-
Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
-
Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks
In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).
-
How Width and Data Shape Generalization Scaling Laws in Quadratic Neural Networks
Quadratic two-layer networks exhibit data-dependent power-law generalization scaling with distinct regimes in width and sample size, including an interpolation transition whose location depends on target spectrum.
-
Learning Sparse Compositional Functions with Norm-Constrained Neural Networks
Derives approximation rates and excess risk bounds for Frobenius norm-constrained DNNs learning sparse compositional functions on DAGs, applicable to multi-index models and binary trees while avoiding the curse of dimensionality.