Identifiable Equivariant Networks are Layerwise Equivariant
read the original abstract
We investigate the relation between end-to-end equivariance and layerwise equivariance in deep neural networks. We prove the following: For a network whose end-to-end function is equivariant with respect to group actions on the input and output spaces, there is a parameter choice yielding the same end-to-end function such that its layers are equivariant with respect to some group actions on the latent spaces. Our result assumes that the parameters of the model are identifiable in an appropriate sense. This identifiability property has been established in the literature for a large class of networks, to which our results apply immediately, while it is conjectural for others. The theory we develop is grounded in an abstract formalism, and is therefore architecture-agnostic. Overall, our results provide a mathematical explanation for the emergence of equivariant structures in the weights of neural networks during training -- a phenomenon that is consistently observed in practice.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Most ReLU Networks Admit Identifiable Parameters
For ReLU networks with input and hidden widths at least 2, most parameters are identifiable up to symmetry, so the functional dimension equals the parameter count minus the number of hidden neurons.
-
Most ReLU Networks Admit Identifiable Parameters
For ReLU networks with width at least two in input and hidden layers, an open set of parameters is identifiable, implying functional dimension equals parameter count minus hidden neurons.
-
Conservation Laws from Data Symmetry in Neural Networks
Data symmetries generically do not induce conserved quantities in NN training for analytic non-polynomial losses, but can for MSE with tensorizable networks.
-
Algebraic Networks and Architectural Degenerations
Develops algebraic geometry tools for monomial neural networks and proves the singular locus of neurovarieties is contained in the architectural degeneracy locus for fully connected networks with non-increasing widths...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.