This allows us to replace y(1) n =x (0) n + Att(1) x(0) ≤n − →y (1) n =x (0) n ⊕x (0) n−1,(XII7) where⊕denotes concatenation into orthogonal subspaces

Minimal Network Construction First, we assume that Att1 extracts the preceding state, maps it to a subspace orthogonal to that of the current state

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Distinct mechanisms underlying in-context learning in transformers

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Transformers develop four algorithmic phases of in-context learning on Markov chains via two distinct multi-layer subcircuit mechanisms, with phase boundaries set by data diversity K.

citing papers explorer

Showing 1 of 1 citing paper.

Distinct mechanisms underlying in-context learning in transformers cs.LG · 2026-04-14 · unverdicted · none · ref 34
Transformers develop four algorithmic phases of in-context learning on Markov chains via two distinct multi-layer subcircuit mechanisms, with phase boundaries set by data diversity K.

This allows us to replace y(1) n =x (0) n + Att(1) x(0) ≤n − →y (1) n =x (0) n ⊕x (0) n−1,(XII7) where⊕denotes concatenation into orthogonal subspaces

fields

years

verdicts

representative citing papers

citing papers explorer