For random 2-layer ReLU networks the dominant eigenspaces of the Fisher information matrix are spanned by spherical harmonics of degree ≤2 and capture 97.7% of the trace independently of parameter count.
where E 3π 8 ∥ ˆZαβ∥3C 1 − (d − 5)C 2 12 + Rαβ 1{∥ ˆZαβ∥2 ≥ 1 − rαβ} = O(r2 αβ) follows from directly integrating, similar to the proof of Theorem 3.3 for d ≥ 6
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Approximating Simple ReLU Networks based on Spectral Decomposition of Fisher Information
For random 2-layer ReLU networks the dominant eigenspaces of the Fisher information matrix are spanned by spherical harmonics of degree ≤2 and capture 97.7% of the trace independently of parameter count.