LLMs show statistical preemption for 120 verb-construction pairs, with surprisal driven by competing-form frequency rather than verb frequency, scaling as a power law with size, and causally shifted by controlled fine-tuning.
Ethan Gotlieb Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, and Roger P
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Transformers on synthetic grammar acquire abstract global statistical knowledge first, then local dependencies, showing initial over-generalizations that are later constrained.
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
citing papers explorer
-
Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs
LLMs show statistical preemption for 120 verb-construction pairs, with surprisal driven by competing-form frequency rather than verb frequency, scaling as a power law with size, and causally shifted by controlled fine-tuning.
-
Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns
Transformers on synthetic grammar acquire abstract global statistical knowledge first, then local dependencies, showing initial over-generalizations that are later constrained.
-
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.