Mind the gap: a spectral analysis of rank collapse and signal propagation in transformers.arXiv preprint arXiv:2410.07799

· 2024 · arXiv 2410.07799

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Sinks are equivalent to hard attention switches that zero out outputs and are cheaper than diagonal patterns when self-communication is allowed, closing the gap between oversmoothing prevention needs and what sinks provide.

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.

Analogies between Transformer Layers and Power Method

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

Transformer layers are analogous to power method steps, tilting tokens toward the principal eigenvector of the output-value weight product, with stronger analytical and empirical alignment in shared-weight models and a proposed steering method.

Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers

cs.LG · 2026-04-26 · unverdicted · novelty 6.0

Residual connections prevent rank collapse in Transformers without needing the MLP, which instead creates new feature directions; head-channel non-identifiability is a distinct mixing problem fixed by a low-cost position-gated projection, all unified via symmetry breaking.

Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation

stat.ML · 2025-05-30 · unverdicted · novelty 6.0

Analytical theory of signal propagation in deep transformers at initialization yields quantitative prescriptions for weights and residuals to avoid rank and entropy collapse via Random Energy Model analogy.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention cs.LG · 2026-05-08 · unverdicted · none · ref 33
Sinks are equivalent to hard attention switches that zero out outputs and are cheaper than diagonal patterns when self-communication is allowed, closing the gap between oversmoothing prevention needs and what sinks provide.
Contribution Weights: A Geometrical Analysis of Self-Attention Transformers cs.LG · 2026-05-29 · unverdicted · none · ref 92
Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.
Analogies between Transformer Layers and Power Method cs.LG · 2026-05-25 · unverdicted · none · ref 26
Transformer layers are analogous to power method steps, tilting tokens toward the principal eigenvector of the output-value weight product, with stronger analytical and empirical alignment in shared-weight models and a proposed steering method.
Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers cs.LG · 2026-04-26 · unverdicted · none · ref 18
Residual connections prevent rank collapse in Transformers without needing the MLP, which instead creates new feature directions; head-channel non-identifiability is a distinct mixing problem fixed by a low-cost position-gated projection, all unified via symmetry breaking.

Mind the gap: a spectral analysis of rank collapse and signal propagation in transformers.arXiv preprint arXiv:2410.07799

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer