Prescriptive Scaling Laws for Data Constrained Training

· 2026 · cs.LG · arXiv 2605.01640

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Training compute is increasingly outpacing the availability of high-quality data. This shifts the central challenge from optimal compute allocation to extracting maximum value from limited data. The widely adopted Chinchilla scaling law assumes every training token is unique. This limits its ability to guide pretraining decisions in data-constrained regimes. We model the excess loss under repetition with a simple additive overfitting penalty and find that it accurately describes model behavior. Our scaling law yields qualitatively new compute-optimal allocation advice. Beyond a point, further repetition is counterproductive and compute is better spent on model capacity. We show that following our law's recommended configuration improves performance in data-constrained regimes. Finally, because our one-parameter form isolates overfitting in a single coefficient, it enables direct comparison across training configurations. As a case study, we show that strong weight decay ($\lambda=1.0$) reduces this coefficient by approximately 70%, providing a scaling-law explanation for recent findings that optimal weight decay in data-constrained regimes is an order of magnitude larger than standard practice.

representative citing papers

When Does Generating More Help? Disentangling Fixed-Source Synthesis from Source Expansion in Synthetic Data Scaling

cs.CL · 2026-07-02 · unverdicted · novelty 6.0

Fixed-source synthesis is bounded; a derived scaling law predicts high-budget performance from low-budget fits, and source expansion outperforms fixed-source at large matched budgets.

q0: Primitives for Hyper-Epoch Pretraining

cs.LG · 2026-06-02 · unverdicted · novelty 5.0

q0 turns multi-epoch budgets into diverse model populations using three primitives that outperform single-model training and strong ensembles with fewer epochs on a 1.8B model.

citing papers explorer

Showing 2 of 2 citing papers.

When Does Generating More Help? Disentangling Fixed-Source Synthesis from Source Expansion in Synthetic Data Scaling cs.CL · 2026-07-02 · unverdicted · none · ref 25 · internal anchor
Fixed-source synthesis is bounded; a derived scaling law predicts high-budget performance from low-budget fits, and source expansion outperforms fixed-source at large matched budgets.
q0: Primitives for Hyper-Epoch Pretraining cs.LG · 2026-06-02 · unverdicted · none · ref 5 · internal anchor
q0 turns multi-epoch budgets into diverse model populations using three primitives that outperform single-model training and strong ensembles with fewer epochs on a 1.8B model.

Prescriptive Scaling Laws for Data Constrained Training

fields

years

verdicts

representative citing papers

citing papers explorer