Recognition: unknown
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
read the original abstract
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
This paper has not been read by Pith yet.
Forward citations
Cited by 12 Pith papers
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
A u-shaped fully-convolutional encoder-decoder with skip connections trained with elastic-deformation augmentation produces accurate biomedical image segmentations from very small training sets.
-
Determining star formation histories and age-metallicity relations with convolutional neural networks
A CNN with attention and shared latent space recovers SFHs and metallicities from spectro-photometric data with ~0.12 dex age and ~0.03 dex metallicity dispersion while running thousands of times faster than full spec...
-
Criticality and Saturation in Orthogonal Neural Networks
Derives layer-wise recursions for finite-width tensors under orthogonal initialization that reproduce the observed large-depth stability of nonlinear networks.
-
A Theory of Saddle Escape in Deep Nonlinear Networks
Derives exact norm-imbalance identity for deep nonlinear nets, classifying activations into four classes and yielding escape time law τ★ = Θ(ε^{-(r-2)}) governed by bottleneck depth r.
-
A Theory of Saddle Escape in Deep Nonlinear Networks
An exact norm-imbalance identity classifies activations into four classes and reduces deep nonlinear training flow to a scalar ODE that predicts saddle escape time scaling as ε to the power of minus (r-2) for r bottle...
-
Isotropic Activation Functions Enable Deindividuated Neurons and Adaptive Topologies
Isotropic activation functions derived from reparameterisation symmetries and SVD diagonalisation enable function-preserving neuron removal and addition in dense networks, supporting up to 50% sparsification and real-...
-
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Progressive growing stabilizes GAN training to produce high-resolution images of unprecedented quality and achieves a record unsupervised inception score of 8.80 on CIFAR10.
-
Wide Residual Networks
Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.
-
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
LSUN dataset of one million images per category across 30 classes is constructed via iterative human-in-the-loop deep learning labeling.
-
Similarity-Based Bike Station Expansion via Hybrid Denoising Autoencoders
A hybrid denoising autoencoder with supervised head learns latent urban features to select bike station expansion candidates via latent-space similarity, producing 32 consensus high-confidence zones in Trondheim.
-
MONAI: An open-source framework for deep learning in healthcare
MONAI is a community-supported PyTorch framework that extends deep learning to medical data with domain-specific architectures, transforms, and deployment tools.
-
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.