arxiv: 1903.12261 · v1 · submitted 2019-03-28 · 💻 cs.LG · cs.CV· stat.ML

Recognition: 1 theorem link

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks , Thomas Dietterich

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords neural network robustnessimage corruptionsimage perturbationsbenchmark datasetsImageNet-CImageNet-Pcommon degradationssafety-critical classification

0 comments

The pith

Benchmarks for common corruptions show negligible relative robustness gains from AlexNet to ResNet.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ImageNet-C and ImageNet-P as standardized benchmarks to evaluate image classifier robustness to everyday degradations such as noise, blur, weather effects, and small perturbations. It reports that relative performance on these tests shows little improvement as networks advance from AlexNet to deeper ResNet architectures, even though clean-image accuracy rises substantially. This matters for safety-critical applications where classifiers encounter these common issues rather than only artificial adversarial attacks. The benchmarks also identify training approaches that can lift robustness on the new tests, including repurposed adversarial defenses. Overall the work supplies tools and evidence to guide development of networks that handle real-world variations more reliably.

Core claim

We establish ImageNet-C as a benchmark consisting of 15 corruption types applied at five severity levels to ImageNet images, and ImageNet-P as a benchmark of perturbation sequences such as rotations and translations. These measure average-case robustness to common, realistic image degradations instead of worst-case adversarial examples. We find negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. We further show that certain training methods improve performance on both benchmarks and that a bypassed adversarial defense yields substantial robustness to common perturbations.

What carries the argument

ImageNet-C and ImageNet-P datasets, which apply standardized sequences of common corruptions and perturbations to measure classifier accuracy under realistic degradations rather than adversarial attacks.

If this is right

Safety-critical systems can use ImageNet-C and ImageNet-P scores to select among candidate classifiers.
Training procedures that improve scores on these benchmarks will produce networks better suited to real deployment conditions.
Some existing adversarial defenses can be adapted to increase robustness against natural perturbations without new design work.
Future architecture search and training should include explicit targets for corruption and perturbation robustness to achieve better generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Clean accuracy gains have not automatically produced better handling of the variations that occur in deployed systems.
Robustness research focused solely on adversarial attacks may miss opportunities to improve performance on the more frequent natural degradations.
These benchmarks could be adapted to other data modalities such as video or audio to test generalization more broadly.

Load-bearing premise

The fifteen chosen corruptions and the specific perturbations in ImageNet-P are representative of the common real-world image degradations that classifiers will encounter outside the lab.

What would settle it

A new classifier that ranks highly on ImageNet-C but ranks much lower when tested on a fresh collection of common corruptions such as lens flare, unexpected lighting shifts, or sensor noise would contradict the claim that the benchmark rankings are stable and useful.

read the original abstract

In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is defining ImageNet-C and ImageNet-P as practical benchmarks, with the finding that relative robustness gains stayed small from AlexNet to ResNets.

read the letter

The central point here is that Hendrycks and Dietterich created two concrete benchmarks—ImageNet-C for 15 fixed corruptions at five severity levels and ImageNet-P for common perturbations—and used them to show that relative corruption robustness barely moved between AlexNet and ResNet models. That observation is new and directly measured on the new data, without relying on any circular derivations. The work is useful because it gives the field shared, reproducible test sets that focus on everyday degradations instead of worst-case adversarial examples, and the results are reported across multiple architectures with consistent metrics like mean corruption error. They also note some simple ways to improve performance, including one bypassed defense that helped on perturbations. The experiments are transparent about how the datasets were built and evaluated, so others can run the same numbers without guessing at parameters. The soft spot is that the 15 corruptions are a reasonable but limited sample of real-world issues; nothing in the paper proves they cover every deployment scenario like unusual sensor artifacts or domain-specific lighting. That limitation is expected for a benchmark paper and does not undermine the reported comparisons. The citation pattern is light on prior robustness work but appropriate since the datasets themselves are the novelty. This is the sort of paper that helps people doing applied robustness or safety-critical vision work get a clearer picture of where current models actually stand. It deserves a serious referee because the benchmarks are well-specified and the empirical claims are straightforward to check. I would send it to review.

Referee Report

0 major / 3 minor

Summary. The paper introduces ImageNet-C, a standardized benchmark of 15 common image corruptions (noise, blur, weather, digital) at five severity levels applied to ImageNet validation images, and ImageNet-P for evaluating robustness to small perturbations such as translations, rotations, and brightness changes. It benchmarks a range of classifiers from AlexNet through VGG, ResNet, and DenseNet families, reporting mean corruption error (mCE) and mean flip rate (mFR) metrics. The central empirical finding is that relative corruption robustness shows negligible improvement from AlexNet to modern ResNets despite large gains in clean accuracy. The authors also demonstrate that certain data augmentations and a bypassed adversarial defense can improve robustness on these benchmarks.

Significance. This work is significant for shifting robustness evaluation from worst-case adversarial perturbations to common, real-world degradations that affect deployed systems. By releasing fixed datasets, severity levels, and evaluation code, it enables reproducible comparisons across models and training methods. The observation that architectural progress has not translated into better relative robustness on ImageNet-C provides a clear, falsifiable signal that can guide future research on generalization and safety-critical applications. The dual-benchmark design (corruptions plus perturbations) offers complementary views of robustness.

minor comments (3)

[§3.1] §3.1: The definition of mean corruption error (mCE) normalizes against AlexNet performance; explicitly state whether this baseline is fixed across all experiments or recomputed, and confirm that no post-hoc model selection affects the reported relative rankings.
[Table 2] Table 2 and Figure 3: Include standard deviations or bootstrap confidence intervals on the mCE and mFR values to allow readers to assess whether the reported 'negligible changes' between AlexNet and ResNet-50 are statistically distinguishable.
[§5] §5: The claim that a bypassed adversarial defense yields substantial perturbation robustness should specify the exact defense, the bypass method, and the quantitative improvement on ImageNet-P so that the result can be reproduced without ambiguity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's significance in shifting robustness evaluation toward common real-world degradations, and recommendation for minor revision. We appreciate the emphasis on reproducibility through fixed datasets and code.

Circularity Check

0 steps flagged

No significant circularity in empirical benchmarking

full rationale

The paper introduces standardized benchmark datasets (ImageNet-C with 15 corruptions at 5 severity levels and ImageNet-P for perturbations) and reports direct empirical measurements of classifier performance, including the central observation of negligible changes in relative corruption robustness between AlexNet and ResNet models via mean corruption error. All claims are computed from evaluations on these newly defined test sets with no mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the results to inputs by construction. The work is self-contained against external benchmarks and contains no self-definitional, ansatz-smuggling, or uniqueness-imported steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmarking paper. No free parameters are fitted to support the central claims, no new axioms are invoked beyond standard image processing definitions, and no invented entities are postulated.

pith-pipeline@v0.9.0 · 5421 in / 971 out tokens · 42114 ms · 2026-05-13T04:56:52.450015+00:00 · methodology

discussion (0)

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Online Learning-to-Defer with Varying Experts
stat.ML 2026-05 unverdicted novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
Sensing-Assisted LoS/NLoS Identification in Dynamic UAV Positioning Systems
eess.SP 2026-05 unverdicted novelty 7.0

A new dual-input feature fusion network using RGB images and channel impulse responses identifies LoS/NLoS conditions for UAVs with up to 97.69% accuracy and reduces trilateration positioning error by about 70%.
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
cs.CV 2026-05 unverdicted novelty 7.0

A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or ...
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction
cs.LG 2026-05 unverdicted novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
CURE-OOD: Benchmarking Out-of-Distribution Detection for Survival Prediction
cs.CV 2026-05 unverdicted novelty 7.0

CURE-OOD is the first benchmark for evaluating OOD detection in survival prediction under controlled CT acquisition shifts, showing that standard detectors often fail and providing a survival-aware baseline.
Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers
cs.CV 2026-04 unverdicted novelty 7.0

CNN classifiers work by holographic superposition and destructive interference in pixel space rather than selecting cleaned features, as proven by a new adjoint inversion framework that also yields a covariance-volume...
Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations
cs.RO 2026-04 unverdicted novelty 7.0

ACO-MoE employs agent-centric mixture-of-experts to decouple task-relevant features from dynamic visual perturbations in RL, recovering 95.3% of clean performance on the new VDCS benchmark.
Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations
cs.RO 2026-04 unverdicted novelty 7.0

ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.
Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals
cs.AI 2026-04 unverdicted novelty 7.0

Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to reta...
Learning Robustness at Test-Time from a Non-Robust Teacher
cs.CV 2026-04 unverdicted novelty 7.0

A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.
MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts
cs.CL 2026-04 unverdicted novelty 7.0

MIXAR is the first autoregressive pixel-based language model for eight languages and scripts, with empirical gains on multilingual tasks, robustness to unseen languages, and further improvements when scaled to 0.5B pa...
Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats
cs.CR 2026-04 unverdicted novelty 7.0

A fine-tuning framework reduces PGD attack success on AdvDA detectors from 100% to 3.2% and MalGuise from 13% to 5.1%, but optimal training strategies differ by threat model and robustness does not transfer across them.
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
cs.LG 2026-05 unverdicted novelty 6.0

Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
Reinforcing Multimodal Reasoning Against Visual Degradation
cs.CV 2026-05 unverdicted novelty 6.0

ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.
MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence
cs.CV 2026-05 unverdicted novelty 6.0

MedVIGIL introduces a clinician-supervised benchmark showing medical VLMs frequently give fluent answers on broken visual evidence, with top models 14 points below human radiologists on the composite score.
Intermediate Representations are Strong AI-Generated Image Detectors
cs.CV 2026-05 unverdicted novelty 6.0

Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
Latent Denoising Improves Visual Alignment in Large Multimodal Models
cs.CV 2026-04 unverdicted novelty 6.0

A latent denoising objective with saliency-aware corruption and contrastive distillation improves visual alignment and corruption robustness in large multimodal models.
Adapting in the Dark: Efficient and Stable Test-Time Adaptation for Black-Box Models
cs.LG 2026-04 unverdicted novelty 6.0

BETA adapts black-box models at test time using a local steering model and regularization techniques to achieve accuracy improvements without additional API queries or high latency.
Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings
cs.LG 2026-04 unverdicted novelty 6.0

Circuit-based metrics from Vision Transformer internals provide better label-free proxies for generalization under distribution shift than existing methods like model confidence.
Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
cs.LG 2026-04 unverdicted novelty 6.0

GenCE is a strictly proper loss obtained by normalizing each sample's softmax against the batch predictions, outperforming cross-entropy in low-data and imbalanced regimes with better calibration and OOD detection.
Dual-axis attribution of zebrafish tectal microcircuits for energy-efficient and robust neurocomputing
cs.NE 2026-05 conditional novelty 5.0

Zebrafish tectal subcircuits are dissociated into spike-efficient information gating and feedback-like robustness stabilization, then transferred to improve ResNet efficiency and noise tolerance.
Probing Routing-Conditional Calibration in Attention-Residual Transformers
cs.CV 2026-05 unverdicted novelty 5.0

Routing summaries and auxiliary features do not provide stable evidence of conditional miscalibration in AR transformers once confidence-matched baselines, capacity controls, and permutation nulls are applied.
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs
cs.CV 2026-05 unverdicted novelty 5.0

ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
cs.LG 2026-05 unverdicted novelty 5.0

Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning
cs.LG 2026-04 unverdicted novelty 5.0

VOLTA, consisting of a deep encoder with learnable prototypes plus cross-entropy and post-hoc temperature scaling, matches or exceeds ten UQ baselines in accuracy, achieves lower expected calibration error, and perfor...
Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
cs.LG 2026-04 unverdicted novelty 5.0

Generative Cross-Entropy loss improves both accuracy and calibration over standard cross-entropy by augmenting it with a generative p(x|y) term, especially on long-tailed data, and pairs with adaptive temperature scal...
When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
cs.CV 2026-05 unverdicted novelty 4.0

Mild rotations and noise significantly increase relation hallucinations in VLMs across models and datasets, with prompt augmentation and preprocessing offering only partial mitigation.
When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
cs.CV 2026-05 unverdicted novelty 4.0

Mild rotations and noise significantly increase relation hallucinations in VLMs across models and datasets, with prompt and preprocessing fixes providing only partial relief.
Robust Deepfake Detection, NTIRE 2026 Challenge: Report
cs.CV 2026-04 unverdicted novelty 2.0

The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 26 Pith papers

[1]

Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition

Ossama Abdel-Hamid, Abdel rahman Mohamed, Hui Jiang, and Gerald Penn. Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. ICASSP, 2013

work page 2013
[2]

Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint, 2018

Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint, 2018

work page 2018
[3]

Measuring neural net robustness with constraints

Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. Measuring neural net robustness with constraints. In NIPS. 2016

work page 2016
[4]

A non-local algorithm for image denoising

Antoni Buades and Bartomeu Coll. A non-local algorithm for image denoising. In CVPR, 2005

work page 2005
[5]

Defensive distillation is not robust to adversarial examples, 2016

Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples, 2016

work page 2016
[6]

Adversarial examples are not easily detected: Bypassing ten detection methods, 2017

Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods, 2017

work page 2017
[7]

Nicholas Carlini, Guy Katz, Clark Barrett, and David L. Dill. Ground-truth adversarial examples, 2017

work page 2017
[8]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. CVPR, 2009

work page 2009
[9]

Quality resilient deep neural networks, 2017 a

Samuel Dodge and Lina Karam. Quality resilient deep neural networks, 2017 a

work page 2017
[10]

A study and comparison of human and deep learning recognition performance under visual distortions, 2017 b

Samuel Dodge and Lina Karam. A study and comparison of human and deep learning recognition performance under visual distortions, 2017 b

work page 2017
[11]

Ideal spatial adaptation by wavelet shrinkage

David Donoho and Iain Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 1993

work page 1993
[12]

Evaluating and understanding the robustness of adversarial logit pairing

Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understanding the robustness of adversarial logit pairing. arXiv preprint, 2018

work page 2018
[13]

Robust physical-world attacks on deep learning models, 2017

Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. Robust physical-world attacks on deep learning models, 2017

work page 2017
[14]

Interpretable explanations of black boxes by meaningful perturbation

Ruth Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. ICCV, 2017

work page 2017
[15]

Image style transfer using convolutional neural networks

Leon Gatys, Alexander Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. CVPR, 2016

work page 2016
[16]

Robert Geirhos, David H. J. Janssen, Heiko H. Schütt, Jonas Rauber, Matthias Bethge, and Felix A. Wichmann. Comparing deep neural networks against humans: object recognition when the signal gets weaker, 2017

work page 2017
[17]

Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR, 2019

work page 2019
[18]

Adams, Ian Goodfellow, David Andersen, and George E

Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, and George E. Dahl. Motivating the rules of the game for adversarial example research. arXiv preprint, 2018 a

work page 2018
[19]

Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow

Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. ICLR Workshop, 2018 b

work page 2018
[20]

Weinberger

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. International Conference on Machine Learning, 2017

work page 2017
[21]

Histogram-based subband powerwarping and spectral averaging for robust speech recognition under matched and multistyle training, 2012

Mark Harvilla and Richard Stern. Histogram-based subband powerwarping and spectral averaging for robust speech recognition under matched and multistyle training, 2012

work page 2012
[22]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2015

work page 2015
[23]

Early methods for detecting adversarial images, 2017 a

Dan Hendrycks and Kevin Gimpel. Early methods for detecting adversarial images, 2017 a

work page 2017
[24]

A baseline for detecting misclassified and out-of-distribution examples in neural networks

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017 b

work page 2017
[25]

Using trusted data to train deep networks on labels corrupted by severe noise

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train deep networks on labels corrupted by severe noise. NIPS, 2018

work page 2018
[26]

Deep anomaly detection with outlier exposure

Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. ICLR, 2019

work page 2019
[27]

Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments, 2007

Hans-G\" u nter Hirsch. Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments, 2007

work page 2007
[28]

The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

Hans-G\" u nter Hirsch and David Pearce. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000, 2000

work page 2000
[29]

Google's cloud vision api is not robust to noise, 2017

Hossein Hosseini, Baicen Xiao, and Radha Poovendran. Google's cloud vision api is not robust to noise, 2017

work page 2017
[30]

Condensenet: An efficient DenseNet using learned group convolutions

Gao Huang, Shichen Liu, Laurens van der Maaten, and Kilian Q Weinberger. Condensenet: An efficient DenseNet using learned group convolutions. arXiv preprint, 2017 a

work page 2017
[31]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 b

work page 2017
[32]

Weinberger

Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q. Weinberger. Multi-scale dense networks for resource efficient image classification. ICLR, 2018

work page 2018
[33]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. JMLR, 2015

work page 2015
[34]

Adversarial logit pairing

Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. NIPS, 2018

work page 2018
[35]

Tsung-Wei Ke, Michael Maire, and Stella X. Yu. Multigrid neural architectures, 2017

work page 2017
[36]

Chanwoo Kim and Richard M. Stern. Power-normalized cepstral coefficients ( PNCC ) for robust speech recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 24 0 (7): 0 1315--1329, July 2016. ISSN 2329-9290

work page 2016
[37]

Imagenet classification with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012

work page 2012
[38]

Generalized distances between rankings, 2010

Ravi Kumar and Sergei Vassilvitskii. Generalized distances between rankings, 2010

work page 2010
[39]

Adversarial machine learning at scale

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR, 2017

work page 2017
[40]

An overview of noise-robust automatic speech recognition

Jinyu Li, Li Deng, Yifan Gong, and Reinhold Haeb-Umbach. An overview of noise-robust automatic speech recognition. 2014

work page 2014
[41]

Stern, Xuedong Huang, and Alex Acero

Fu-Hua Liu, Richard M. Stern, Xuedong Huang, and Alex Acero. Efficient cepstral normalization for robust speech recognition. In Proc. of DARPA Speech and Natural Language Workshop, 1993

work page 1993
[42]

Open category detection with PAC guarantees

Si Liu, Risheek Garrepalli, Thomas Dietterich, Alan Fern, and Dan Hendrycks. Open category detection with PAC guarantees. In ICML, 2018

work page 2018
[43]

Standard detectors aren't (currently) fooled by physical adversarial stop signs, 2017

Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth. Standard detectors aren't (currently) fooled by physical adversarial stop signs, 2017

work page 2017
[44]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR, 2018

work page 2018
[45]

On detecting adversarial perturbations, 2017

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations, 2017

work page 2017
[46]

Vikramjit Mitra, Horacio Franco, Richard Stern, Julien Van Hout, Luciana Ferrer, Martin Graciarena, Wen Wang, Dimitra Vergyri, Abeer Alwan, and John H.L. Hansen. Robust features in deep learning based speech recognition, 2017

work page 2017
[47]

Feature visualization

Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Feature visualization. Distill, 2017

work page 2017
[48]

Distillation as a defense to adversarial perturbations against deep neural networks, 2017

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks, 2017

work page 2017
[49]

Pizer, E

Stephen M. Pizer, E. Philip Amburn, John D. Austin, Robert Cromartie, Ari Geselowitz, Trey Greer, Bart Ter Haar Romeny, and John B. Zimmerman. Adaptive histogram equalization and its variations. Computer Vision, Graphics, and Image Processing, 1987

work page 1987
[50]

Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models, 2017

Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models, 2017

work page 2017
[51]

Do cifar-10 classifiers generalize to cifar-10? arXiv preprint, 2018

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do cifar-10 classifiers generalize to cifar-10? arXiv preprint, 2018

work page 2018
[52]

Adversarially robust generalization requires more data

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. arXiv preprint, 2018

work page 2018
[53]

Towards deep learning models resistant to adversarial attacks

Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. Towards deep learning models resistant to adversarial attacks. arXiv preprint, 2018

work page 2018
[54]

Certified defenses for data poisoning attacks

Jacob Steinhardt, Pang Wei Koh, and Percy Liang. Certified defenses for data poisoning attacks. NIPS, 2017

work page 2017
[55]

Intriguing properties of neural networks, 2014

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks, 2014

work page 2014
[56]

Traffic signs in the wild: Highlights from the ieee video and image processing cup 2017 student competition

Dogancan Temel and Ghassan AlRegib. Traffic signs in the wild: Highlights from the ieee video and image processing cup 2017 student competition. IEEE Signal Processing Magazine, 2018

work page 2017
[57]

Cure-tsr: Challenging unreal and real environments for traffic sign recognition

Dogancan Temel, Gukyeong Kwon, Mohit Prabhushankar, and Ghassan AlRegib. Cure-tsr: Challenging unreal and real environments for traffic sign recognition. NIPS Workshop, 2017

work page 2017
[58]

Cure-or: Challenging unreal and real environments for object recognition

Dogancan Temel, Jinsol Lee, and Ghassan AlRegib. Cure-or: Challenging unreal and real environments for object recognition. ICMLA, 2018

work page 2018
[59]

Histogram equalization of speech representation for robust speech recognition

\' A ngel de la Torre , Antonio Peinado, Jos\' e Segura, Jos\' e P\' e rez-C\' o rdoba, Ma Carmen Ben\' i tez, and Antonio Rubio. Histogram equalization of speech representation for robust speech recognition. IEEE Signal Processing Society, 2005

work page 2005
[60]

Examining the impact of blur on recognition by convolutional networks, 2016

Igor Vasiljevic, Ayan Chakrabarti, and Gregory Shakhnarovich. Examining the impact of blur on recognition by convolutional networks, 2016

work page 2016
[61]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. CVPR, 2016

work page 2016
[62]

Improving the robustness of deep neural networks via stability training, 2016

Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training, 2016

work page 2016