Position paper claims fixed exponents in scaling laws arise from generic mechanisms while coefficients vary with data and architecture, making the latter the focus for improvements.
When attention collapses: How degenerate layers in llms enable smaller, stronger models.arXiv preprint arXiv:2404.08634, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients
Position paper claims fixed exponents in scaling laws arise from generic mechanisms while coefficients vary with data and architecture, making the latter the focus for improvements.