Repeating 0.1% of training data 100 times degrades an 800M parameter model's performance to that of a 400M model by damaging copying mechanisms and induction heads associated with generalization.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2022 2representative citing papers
Category theory proves prompt-based learning on perfect foundation models works only for representable tasks, fine-tuning solves tasks in the pretext category, and models can represent unseen target-category objects using source-category structure.
citing papers explorer
-
Scaling Laws and Interpretability of Learning from Repeated Data
Repeating 0.1% of training data 100 times degrades an 800M parameter model's performance to that of a 400M model by damaging copying mechanisms and induction heads associated with generalization.
-
On the Power of Foundation Models
Category theory proves prompt-based learning on perfect foundation models works only for representable tasks, fine-tuning solves tasks in the pretext category, and models can represent unseen target-category objects using source-category structure.