Title resolution pending

· 2025 · arXiv 2510.22026

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

math.AP · 2026-05-11 · unverdicted · novelty 6.0

In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).

Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Polaris learns hierarchical concepts via coupled orbital polar embeddings on hyperspheres that separate meaning from structure using tangent projections, exponential maps, and asymmetric objectives, yielding up to 19-point gains in top-K retrieval.

When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.

citing papers explorer

Showing 4 of 4 citing papers.

Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 13
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime math.AP · 2026-05-11 · unverdicted · none · ref 31
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).
Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning cs.LG · 2026-04-30 · unverdicted · none · ref 64
Polaris learns hierarchical concepts via coupled orbital polar embeddings on hyperspheres that separate meaning from structure using tangent projections, exponential maps, and asymmetric objectives, yielding up to 19-point gains in top-K retrieval.
When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer cs.LG · 2026-04-25 · unverdicted · none · ref 13
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer