Geometry of Lightning Self-Attention: Identifiability and Dimension

Giovanni Luca Marchetti; Kathl\'en Kohn; Nathan W. Henry

arxiv: 2408.17221 · v3 · pith:YY4C52G7new · submitted 2024-08-30 · 💻 cs.LG · math.AG

Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan W. Henry , Giovanni Luca Marchetti , Kathl\'en Kohn This is my paper

classification 💻 cs.LG math.AG

keywords geometrynetworksself-attentiondeepdimensionfunctionidentifiabilityadditionally

0 comments

read the original abstract

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Conservation Laws from Data Symmetry in Neural Networks
cs.LG 2026-06 unverdicted novelty 7.0

Data symmetries generically do not induce conserved quantities in NN training for analytic non-polynomial losses, but can for MSE with tensorizable networks.