Data symmetries generically do not induce conserved quantities in NN training for analytic non-polynomial losses, but can for MSE with tensorizable networks.
Most ReLU Networks Admit Identifiable Parameters
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
We study the realization map of deep ReLU networks, focusing on when a function determines its parameters up to scaling and permutation. To analyze hidden redundancies beyond these standard symmetries, we introduce a framework based on weighted polyhedral complexes. Our main result shows that for every architecture whose input and hidden layers have width at least two, there exists an open set of identifiable parameters. This implies that the functional dimension of every such architecture is exactly the number of parameters minus the number of hidden neurons. We further show that minimal functional representations can still have non-trivial parameter redundancies. Finally, we establish a generic depth hierarchy, whereby for an open set of parameters the realized function cannot be represented generically by any shallower network.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
ReLU neuromanifolds are not semi-algebraic quotients of weight spaces; honest opens are conjectured semi-algebraic and proven Zariski in the shallow case.
For large monomial activation degree, critical points in deep fully-connected networks coincide exactly with subnetwork configurations where neurons are inactive or redundant.
citing papers explorer
-
Conservation Laws from Data Symmetry in Neural Networks
Data symmetries generically do not induce conserved quantities in NN training for analytic non-polynomial losses, but can for MSE with tensorizable networks.
-
On the fibers and semi-algebraicity of ReLU neuromanifolds
ReLU neuromanifolds are not semi-algebraic quotients of weight spaces; honest opens are conjectured semi-algebraic and proven Zariski in the shallow case.
-
Singular Learning and Occam's Razor in Deep Monomial Networks
For large monomial activation degree, critical points in deep fully-connected networks coincide exactly with subnetwork configurations where neurons are inactive or redundant.