pith. machine review for the scientific record. sign in

arxiv: 1704.08863 · v2 · submitted 2017-04-28 · 💻 cs.LG

Recognition: unknown

On weight initialization in deep neural networks

Authors on Pith no claims yet
classification 💻 cs.LG
keywords initializationweightneuralactivationactivationsderivefunctionsinitializations
0
0 comments X
read the original abstract

A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with RELU activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Efficient Unlearning through Maximizing Relearning Convergence Delay

    cs.LG 2026-04 unverdicted novelty 7.0

    The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.