Sinks are equivalent to hard attention switches that zero out outputs and are cheaper than diagonal patterns when self-communication is allowed, closing the gap between oversmoothing prevention needs and what sinks provide.
Mind the gap: a spectral analysis of rank collapse and signal propagation in attention layers
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Residual connections prevent rank collapse in Transformers without needing the MLP, which instead creates new feature directions; head-channel non-identifiability is a distinct mixing problem fixed by a low-cost position-gated projection, all unified via symmetry breaking.
citing papers explorer
-
Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention
Sinks are equivalent to hard attention switches that zero out outputs and are cheaper than diagonal patterns when self-communication is allowed, closing the gap between oversmoothing prevention needs and what sinks provide.
-
Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers
Residual connections prevent rank collapse in Transformers without needing the MLP, which instead creates new feature directions; head-channel non-identifiability is a distinct mixing problem fixed by a low-cost position-gated projection, all unified via symmetry breaking.