Repeating smaller datasets speeds up training via sampling biases that enable appropriate layer-wise growth, leading to compute savings over larger datasets across tasks and architectures.
Annual Conference Computational Learning Theory , year =
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 3years
2026 3roles
background 1polarities
background 1representative citing papers
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
citing papers explorer
No citing papers match the current filters.