Derives ODE deterministic equivalents and an adversarial homogenized SDE for SGD iterates in high-dim ℓ2-adversarial training, showing no constant learning rate ensures monotone descent for single-class adversarial least squares and equivalence to adaptive regularized standard SGD.
Learning curves for sgd on structured features.arXiv preprint arXiv:2106.02713, 2021
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Position paper claims fixed exponents in scaling laws arise from generic mechanisms while coefficients vary with data and architecture, making the latter the focus for improvements.
citing papers explorer
-
Homogenization of $\ell_2$-Adversarial Training in High-Dimensions: Exact Dynamics under Stochastic Gradient Descent
Derives ODE deterministic equivalents and an adversarial homogenized SDE for SGD iterates in high-dim ℓ2-adversarial training, showing no constant learning rate ensures monotone descent for single-class adversarial least squares and equivalence to adaptive regularized standard SGD.
-
Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients
Position paper claims fixed exponents in scaling laws arise from generic mechanisms while coefficients vary with data and architecture, making the latter the focus for improvements.