In a stochastic k-ary tree, a two-head transformer learns randomized DFS via policy gradient under depth-wise curriculum, generalizes to deeper trees, and adapts to imbalanced goals via discounting.
arXiv preprint arXiv:2501.12997 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
Curriculum post-training on reasoning trees yields polynomial sample complexity for accurate Chain-of-Thought generation in transformers, unlike exponential requirements without curriculum.
citing papers explorer
-
Agentic Transformers Provably Learn to Search via Reinforcement Learning
In a stochastic k-ary tree, a two-head transformer learns randomized DFS via policy gradient under depth-wise curriculum, generalizes to deeper trees, and adapts to imbalanced goals via discounting.
-
Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training
Curriculum post-training on reasoning trees yields polynomial sample complexity for accurate Chain-of-Thought generation in transformers, unlike exponential requirements without curriculum.