pith. sign in

hub

Improving neural networks by preventing co-adaptation of feature detectors

31 Pith papers cite this work. Polarity classification is still indexing.

31 Pith papers citing it
abstract

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

hub tools

citation-role summary

method 3

citation-polarity summary

roles

method 3

polarities

use method 3

clear filters

representative citing papers

Generative Adversarial Networks

stat.ML · 2014-06-10 · accept · novelty 9.0

A generative model is trained to match a data distribution by competing in a minimax game against a discriminator, reaching an equilibrium where the generator recovers the true distribution and the discriminator outputs 1/2 everywhere.

Deep Residual Learning for Image Recognition

cs.CV · 2015-12-10 · accept · novelty 8.0

Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

Conditional Generative Adversarial Nets

cs.LG · 2014-11-06 · accept · novelty 8.0

Conditional GANs generate samples matching a given condition by supplying the condition to both generator and discriminator.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

RayDer is a unified transformer backbone for self-supervised static-scene novel view synthesis that absorbs dynamic content as a nuisance factor and shows power-law scaling with data and compute while matching supervised methods in zero-shot settings.

Rotary Masked Autoencoders are Versatile Learners

cs.LG · 2025-05-26 · unverdicted · novelty 6.0

RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.

Open DNN Box by Power Side-Channel Attack

cs.CR · 2019-07-21 · unverdicted · novelty 6.0

Power side-channel analysis recovers DNN architecture and parameters at 96.5% average accuracy on real embedded devices.

Defending Adversarial Attacks by Correcting logits

cs.LG · 2019-06-26 · unverdicted · novelty 5.0

A two-layer network trained on mixed clean and perturbed logits recovers original predictions for a range of adversarial attacks without needing image data.

citing papers explorer

Showing 3 of 3 citing papers after filters.