When attention collapses: How degenerate layers in llms enable smaller, stronger models.arXiv preprint arXiv:2404.08634, 2024

Sunny Sanyal, Ravid Shwartz-Ziv, Alexandros G Dimakis, Sujay Sanghavi · 2024 · arXiv 2404.08634

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

cs.LG · 2026-06-23 · unverdicted · novelty 4.0

Position paper claims fixed exponents in scaling laws arise from generic mechanisms while coefficients vary with data and architecture, making the latter the focus for improvements.

citing papers explorer

Showing 1 of 1 citing paper.

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients cs.LG · 2026-06-23 · unverdicted · none · ref 23
Position paper claims fixed exponents in scaling laws arise from generic mechanisms while coefficients vary with data and architecture, making the latter the focus for improvements.

When attention collapses: How degenerate layers in llms enable smaller, stronger models.arXiv preprint arXiv:2404.08634, 2024

fields

years

verdicts

representative citing papers

citing papers explorer