Curriculum post-training on reasoning trees yields polynomial sample complexity for accurate Chain-of-Thought generation in transformers, unlike exponential requirements without curriculum.
Let˜αl(j) := exp(sl(j))P q̸=il exp(sl(q)) be the softmax normalized only over competitors, evaluated atW (t−1)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training
Curriculum post-training on reasoning trees yields polynomial sample complexity for accurate Chain-of-Thought generation in transformers, unlike exponential requirements without curriculum.