Transformers learn to implement preconditioned gradient descent for in-context learning , year =

Ahn, Kwangjun, Cheng, Xiang, Daneshmand, Hadi, Sra, Suvrit , journal =

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning

cs.LG · 2026-05-08 · unverdicted · novelty 8.0 · 2 refs

Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning cs.LG · 2026-05-08 · unverdicted · none · ref 23 · 2 links
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.

Transformers learn to implement preconditioned gradient descent for in-context learning , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer