Recognition: unknown
Deep Learning is Robust to Massive Label Noise
read the original abstract
Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to larger but noisy datasets that are more easily obtained. In this paper, we show that deep neural networks are capable of generalizing from training data for which true labels are massively outnumbered by incorrect labels. We demonstrate remarkably high test performance after training on corrupted data from MNIST, CIFAR, and ImageNet. For example, on MNIST we obtain test accuracy above 90 percent even after each clean training example has been diluted with 100 randomly-labeled examples. Such behavior holds across multiple patterns of label noise, even when erroneous labels are biased towards confusing classes. We show that training in this regime requires a significant but manageable increase in dataset size that is related to the factor by which correct labels have been diluted. Finally, we provide an analysis of our results that shows how increasing noise decreases the effective batch size.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
FB-NLL: A Feature-Based Approach to Tackle Noisy Labels in Personalized Federated Learning
FB-NLL decouples user clustering from training dynamics by using subspace similarity on feature covariances and corrects noisy labels via directional alignment in learned feature space.
-
BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature
BioMiner introduces a multi-modal extraction system and BioVista benchmark that achieves F1 0.32 on bioactivity triplets and demonstrates utility in scaling datasets and improving QSAR models.
-
Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO
A multi-block attention neural network reduces pilot overhead by 87% and NMSE by 51% at 10 dB SNR for cascaded channel estimation in IRS-assisted mmWave MIMO-OFDM systems.
-
Inferring Asteroseismic Parameters from Short Observations Using Deep Learning: Application to TESS and K2 Red Giants
Deep learning infers Δν and ν_max from one-month TESS and K2 observations of red giants with reliable results for ~50% of Kepler/K2 samples and ~23% of TESS stars, plus ΔΠ1 for ~200 K2 young red giants that match know...
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.