Anisotropy in language transformers arises because training amplifies tangent directions, with activation-based low-rank proxies capturing unusually large gradient energy and anisotropy share compared to controls.
naacl-main.403/
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics
Anisotropy in language transformers arises because training amplifies tangent directions, with activation-based low-rank proxies capturing unusually large gradient energy and anisotropy share compared to controls.