arXiv preprint arXiv:2406.06893 , year=

Transformers provably learn sparse token selection while fully-connected nets cannot , author= · arXiv 2406.06893

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Agentic Transformers Provably Learn to Search via Reinforcement Learning

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

In a stochastic k-ary tree, a two-head transformer learns randomized DFS via policy gradient under depth-wise curriculum, generalizes to deeper trees, and adapts to imbalanced goals via discounting.

citing papers explorer

Showing 1 of 1 citing paper.

Agentic Transformers Provably Learn to Search via Reinforcement Learning cs.LG · 2026-05-29 · unverdicted · none · ref 38
In a stochastic k-ary tree, a two-head transformer learns randomized DFS via policy gradient under depth-wise curriculum, generalizes to deeper trees, and adapts to imbalanced goals via discounting.

arXiv preprint arXiv:2406.06893 , year=

fields

years

verdicts

representative citing papers

citing papers explorer