MoPE replaces fixed sinusoidal or rotary positional encodings with per-dimension learned Morlet wavelets that recover prior methods as limits and add a Gaussian locality kernel, yielding a 0.119 gain on TinyShakespeare when paired with energy-gated attention.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Energy-Gated Attention improves language model validation loss by gating attention according to spectral energy of key embeddings discovered by a learned projection, with consistent gains on TinyShakespeare and Penn Treebank using under 0.26% extra parameters.
EGA and MoPE together yield a 0.119 validation loss improvement on TinyShakespeare that exceeds the sum of their individual effects, indicating complementary inductive biases for salience and locality.
Applies multiscale POD with Morlet scalograms to transformer attention fields to extract dominant modes per scale and reports layer-dependent scale organisation.
citing papers explorer
-
Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding
MoPE replaces fixed sinusoidal or rotary positional encodings with per-dimension learned Morlet wavelets that recover prior methods as limits and add a Gaussian locality kernel, yielding a 0.119 gain on TinyShakespeare when paired with energy-gated attention.
-
Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention
Energy-Gated Attention improves language model validation loss by gating attention according to spectral energy of key embeddings discovered by a learned projection, with consistent gains on TinyShakespeare and Penn Treebank using under 0.26% extra parameters.
-
Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention
EGA and MoPE together yield a 0.119 validation loss improvement on TinyShakespeare that exceeds the sum of their individual effects, indicating complementary inductive biases for salience and locality.