Energy-Gated Attention improves language model validation loss by gating attention according to spectral energy of key embeddings discovered by a learned projection, with consistent gains on TinyShakespeare and Penn Treebank using under 0.26% extra parameters.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
EGA and MoPE together yield a 0.119 validation loss improvement on TinyShakespeare that exceeds the sum of their individual effects, indicating complementary inductive biases for salience and locality.
citing papers explorer
-
Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention
Energy-Gated Attention improves language model validation loss by gating attention according to spectral energy of key embeddings discovered by a learned projection, with consistent gains on TinyShakespeare and Penn Treebank using under 0.26% extra parameters.
-
Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention
EGA and MoPE together yield a 0.119 validation loss improvement on TinyShakespeare that exceeds the sum of their individual effects, indicating complementary inductive biases for salience and locality.