Recognition: unknown
On the Number of Linear Regions of Deep Neural Networks
read the original abstract
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs. The compositional structure of these functions enables them to re-use pieces of computation exponentially often in terms of the network's depth. This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks. We improve complexity bounds from pre-existing work and investigate the behavior of units in higher layers.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Applying a head-specific sigmoid gate after SDPA in LLMs boosts performance and stability by adding non-linearity and query-dependent sparse modulation while reducing attention sinks.
-
A Transfer Learning Evaluation of Deep Neural Networks for Image Classification
Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.