Applying this formula independently over several fixed groups covers LayerNorm and the same epsilon-stabilized groupwise normalization pattern

A stabilized normalization block acts on a feature group of sizerby Nγ,β,ϵ(x) = Γ P xp r−1∥P x∥2 2 +ϵ +β, P=I− 1 r 11⊤, ϵ >0, whereΓ = diag(γ)

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

Most neural architectures admit a GSVD representation making the nonlinear portion left-invertible and norm-preserving before the final linear layer.

Showing 1 of 1 citing paper.

A Generalized Singular Value Theory for Neural Networks cs.LG · 2026-05-07 · unverdicted · none · ref 7
Most neural architectures admit a GSVD representation making the nonlinear portion left-invertible and norm-preserving before the final linear layer.