hub

Improving neural networks by preventing co-adaptation of feature detectors

· 2012 · cs.NE · arXiv 1207.0580

31 Pith papers cite this work. Polarity classification is still indexing.

31 Pith papers citing it

open full Pith review browse 31 citing papers arXiv PDF

abstract

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 3

citation-polarity summary

use method 3

representative citing papers

Generative Adversarial Networks

stat.ML · 2014-06-10 · accept · novelty 9.0

A generative model is trained to match a data distribution by competing in a minimax game against a discriminator, reaching an equilibrium where the generator recovers the true distribution and the discriminator outputs 1/2 everywhere.

Deep Residual Learning for Image Recognition

cs.CV · 2015-12-10 · accept · novelty 8.0

Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

Conditional Generative Adversarial Nets

cs.LG · 2014-11-06 · accept · novelty 8.0

Conditional GANs generate samples matching a given condition by supplying the condition to both generator and discriminator.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

A First-Order Mean Field Control Analysis of Transformer Layers under Cross-Entropy Training

math.OC · 2026-06-22 · unverdicted · novelty 7.0

Transformer residual layers are approximated as an explicit Euler scheme for a controlled hidden-state flow whose mean-field limit is a first-order transport control problem with Pontryagin terminal condition given by the softmax residual.

A Spectral Approach for Learning Spatiotemporal Neural Differential Equations

cs.LG · 2023-09-28 · unverdicted · novelty 7.0

A spectral neural differential equation learning method is proposed that handles nonlocal spatial interactions on unbounded domains without discretization.

FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms

cs.LG · 2019-06-28 · unverdicted · novelty 7.0

FIESTA uses bandit algorithms to adaptively decide how many seeds and splits to run for each candidate model, focusing effort on promising ones while providing guarantees on selecting the optimal model.

Simultaneous measurements of $N$-subjettiness observables in jets from gluons and light-flavour quarks, and in decays of boosted W bosons and top quarks

hep-ex · 2026-04-28 · unverdicted · novelty 7.0

CMS reports a simultaneous measurement of 25 N-subjettiness observables in 1-, 2-, and 3-prong jets, unfolded to stable particles with particle-level correlations for QCD modeling.

Improved Regularization of Convolutional Neural Networks with Cutout

cs.CV · 2017-08-15 · accept · novelty 7.0

Randomly masking square regions of input images during CNN training yields new state-of-the-art test errors of 2.56% on CIFAR-10, 15.20% on CIFAR-100, and 1.30% on SVHN.

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

cs.LG · 2013-08-15 · conditional · novelty 7.0

The paper introduces and compares gradient estimators for stochastic binary neurons, notably a decomposition approach and the straight-through estimator, to support sparse conditional computation in deep networks.

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

RayDer is a unified transformer backbone for self-supervised static-scene novel view synthesis that absorbs dynamic content as a nuisance factor and shows power-law scaling with data and compute while matching supervised methods in zero-shot settings.

OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

OmniISR unifies centralized, federated, and hybrid learning by injecting mutual-information supervision and negative-entropy regularization at multiple hidden layers, with supporting convergence and drift bounds.

Rotary Masked Autoencoders are Versatile Learners

cs.LG · 2025-05-26 · unverdicted · novelty 6.0

RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.

Open DNN Box by Power Side-Channel Attack

cs.CR · 2019-07-21 · unverdicted · novelty 6.0

Power side-channel analysis recovers DNN architecture and parameters at 96.5% average accuracy on real embedded devices.

Adaptive Weighting Depth-variant Deconvolution of Fluorescence Microscopy Images with Convolutional Neural Network

eess.IV · 2019-07-07 · unverdicted · novelty 6.0

A CNN predicts depth-variant PSFs for patch-wise deconvolution of fluorescence microscopy images, with adaptive weighting to reduce artifacts, claiming 98.2% accuracy and up to 6.6 dB PSNR gain.

Explicit Dropout: Deterministic Regularization for Transformer Architectures

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

Explicit dropout reformulates stochastic dropout as deterministic loss penalties for Transformers, matching or exceeding standard performance with independent control per component.

Language models recognize dropout and Gaussian noise applied to their activations

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

Language models detect, localize, and distinguish dropout from Gaussian noise applied to their activations, often with high accuracy.

Improving Neutrino Oscillation Measurements through Event Classification

hep-ph · 2025-11-14 · unverdicted · novelty 5.0

Supervised ML classification of neutrino events by interaction channel prior to energy reconstruction improves accuracy and sensitivity by 10-20% in simulated DUNE analyses while remaining robust to generator mismodeling.

Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

cs.LG · 2019-07-16 · unverdicted · novelty 5.0

Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.

Defending Adversarial Attacks by Correcting logits

cs.LG · 2019-06-26 · unverdicted · novelty 5.0

A two-layer network trained on mixed clean and perturbed logits recovers original predictions for a range of adversarial attacks without needing image data.

Optimized Sharing of Coefficients in Parallel Filter Banks

eess.SP · 2019-07-11 · unverdicted · novelty 4.0

A two-stage coefficient grouping algorithm for parallel filter banks that increases sharing and reduces registers, LUTs, and DSP48s by up to 50% on FPGAs.

Simple vs complex temporal recurrences for video saliency prediction

cs.CV · 2019-07-03 · unverdicted · novelty 4.0

Both ConvLSTM and exponential moving average modifications to a static saliency model achieve state-of-the-art video saliency prediction on DHF1K after SALICON pre-training and yield similar maps.

Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles

cs.CV · 2026-04-28 · unverdicted · novelty 4.0

A multi-stream ensemble using DINOv2 and CLIP backbones trained with extreme degradations achieves stable deepfake detection and fourth place in the NTIRE 2026 challenge.

Quantum memory and scrambling from the perspective of a classical neural network

quant-ph · 2026-04-28 · unverdicted · novelty 4.0

Time-dependent quantum memory oscillates faster than OTOC, does not equilibrate, and is more sensitive to symmetry breaking, as shown by neural-network predictions on helical spin chains.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Generative Adversarial Networks stat.ML · 2014-06-10 · accept · none · ref 17
A generative model is trained to match a data distribution by competing in a minimax game against a discriminator, reaching an equilibrium where the generator recovers the true distribution and the discriminator outputs 1/2 everywhere.
Deep Residual Learning for Image Recognition cs.CV · 2015-12-10 · accept · none · ref 14
Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.
Simultaneous measurements of $N$-subjettiness observables in jets from gluons and light-flavour quarks, and in decays of boosted W bosons and top quarks hep-ex · 2026-04-28 · unverdicted · none · ref 100
CMS reports a simultaneous measurement of 25 N-subjettiness observables in 1-, 2-, and 3-prong jets, unfolded to stable particles with particle-level correlations for QCD modeling.

Improving neural networks by preventing co-adaptation of feature detectors

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer