Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4representative citing papers
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).
Polaris separates semantic meaning from hierarchical structure in embeddings via angular geometry and radius on a hypersphere, yielding up to 19-point gains in taxonomy expansion retrieval over baselines.
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.
citing papers explorer
-
Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning
Polaris separates semantic meaning from hierarchical structure in embeddings via angular geometry and radius on a hypersphere, yielding up to 19-point gains in taxonomy expansion retrieval over baselines.