pith. machine review for the scientific record. sign in

arxiv: 1903.12261 · v1 · submitted 2019-03-28 · 💻 cs.LG · cs.CV· stat.ML

Recognition: 1 theorem link

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords neural network robustnessimage corruptionsimage perturbationsbenchmark datasetsImageNet-CImageNet-Pcommon degradationssafety-critical classification
0
0 comments X

The pith

Benchmarks for common corruptions show negligible relative robustness gains from AlexNet to ResNet.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ImageNet-C and ImageNet-P as standardized benchmarks to evaluate image classifier robustness to everyday degradations such as noise, blur, weather effects, and small perturbations. It reports that relative performance on these tests shows little improvement as networks advance from AlexNet to deeper ResNet architectures, even though clean-image accuracy rises substantially. This matters for safety-critical applications where classifiers encounter these common issues rather than only artificial adversarial attacks. The benchmarks also identify training approaches that can lift robustness on the new tests, including repurposed adversarial defenses. Overall the work supplies tools and evidence to guide development of networks that handle real-world variations more reliably.

Core claim

We establish ImageNet-C as a benchmark consisting of 15 corruption types applied at five severity levels to ImageNet images, and ImageNet-P as a benchmark of perturbation sequences such as rotations and translations. These measure average-case robustness to common, realistic image degradations instead of worst-case adversarial examples. We find negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. We further show that certain training methods improve performance on both benchmarks and that a bypassed adversarial defense yields substantial robustness to common perturbations.

What carries the argument

ImageNet-C and ImageNet-P datasets, which apply standardized sequences of common corruptions and perturbations to measure classifier accuracy under realistic degradations rather than adversarial attacks.

If this is right

  • Safety-critical systems can use ImageNet-C and ImageNet-P scores to select among candidate classifiers.
  • Training procedures that improve scores on these benchmarks will produce networks better suited to real deployment conditions.
  • Some existing adversarial defenses can be adapted to increase robustness against natural perturbations without new design work.
  • Future architecture search and training should include explicit targets for corruption and perturbation robustness to achieve better generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Clean accuracy gains have not automatically produced better handling of the variations that occur in deployed systems.
  • Robustness research focused solely on adversarial attacks may miss opportunities to improve performance on the more frequent natural degradations.
  • These benchmarks could be adapted to other data modalities such as video or audio to test generalization more broadly.

Load-bearing premise

The fifteen chosen corruptions and the specific perturbations in ImageNet-P are representative of the common real-world image degradations that classifiers will encounter outside the lab.

What would settle it

A new classifier that ranks highly on ImageNet-C but ranks much lower when tested on a fresh collection of common corruptions such as lens flare, unexpected lighting shifts, or sensor noise would contradict the claim that the benchmark rankings are stable and useful.

read the original abstract

In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces ImageNet-C, a standardized benchmark of 15 common image corruptions (noise, blur, weather, digital) at five severity levels applied to ImageNet validation images, and ImageNet-P for evaluating robustness to small perturbations such as translations, rotations, and brightness changes. It benchmarks a range of classifiers from AlexNet through VGG, ResNet, and DenseNet families, reporting mean corruption error (mCE) and mean flip rate (mFR) metrics. The central empirical finding is that relative corruption robustness shows negligible improvement from AlexNet to modern ResNets despite large gains in clean accuracy. The authors also demonstrate that certain data augmentations and a bypassed adversarial defense can improve robustness on these benchmarks.

Significance. This work is significant for shifting robustness evaluation from worst-case adversarial perturbations to common, real-world degradations that affect deployed systems. By releasing fixed datasets, severity levels, and evaluation code, it enables reproducible comparisons across models and training methods. The observation that architectural progress has not translated into better relative robustness on ImageNet-C provides a clear, falsifiable signal that can guide future research on generalization and safety-critical applications. The dual-benchmark design (corruptions plus perturbations) offers complementary views of robustness.

minor comments (3)
  1. [§3.1] §3.1: The definition of mean corruption error (mCE) normalizes against AlexNet performance; explicitly state whether this baseline is fixed across all experiments or recomputed, and confirm that no post-hoc model selection affects the reported relative rankings.
  2. [Table 2] Table 2 and Figure 3: Include standard deviations or bootstrap confidence intervals on the mCE and mFR values to allow readers to assess whether the reported 'negligible changes' between AlexNet and ResNet-50 are statistically distinguishable.
  3. [§5] §5: The claim that a bypassed adversarial defense yields substantial perturbation robustness should specify the exact defense, the bypass method, and the quantitative improvement on ImageNet-P so that the result can be reproduced without ambiguity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's significance in shifting robustness evaluation toward common real-world degradations, and recommendation for minor revision. We appreciate the emphasis on reproducibility through fixed datasets and code.

Circularity Check

0 steps flagged

No significant circularity in empirical benchmarking

full rationale

The paper introduces standardized benchmark datasets (ImageNet-C with 15 corruptions at 5 severity levels and ImageNet-P for perturbations) and reports direct empirical measurements of classifier performance, including the central observation of negligible changes in relative corruption robustness between AlexNet and ResNet models via mean corruption error. All claims are computed from evaluations on these newly defined test sets with no mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the results to inputs by construction. The work is self-contained against external benchmarks and contains no self-definitional, ansatz-smuggling, or uniqueness-imported steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmarking paper. No free parameters are fitted to support the central claims, no new axioms are invoked beyond standard image processing definitions, and no invented entities are postulated.

pith-pipeline@v0.9.0 · 5421 in / 971 out tokens · 42114 ms · 2026-05-13T04:56:52.450015+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Online Learning-to-Defer with Varying Experts

    stat.ML 2026-05 unverdicted novelty 8.0

    Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

  2. Sensing-Assisted LoS/NLoS Identification in Dynamic UAV Positioning Systems

    eess.SP 2026-05 unverdicted novelty 7.0

    A new dual-input feature fusion network using RGB images and channel impulse responses identifies LoS/NLoS conditions for UAVs with up to 97.69% accuracy and reduces trilateration positioning error by about 70%.

  3. Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

    cs.CV 2026-05 unverdicted novelty 7.0

    A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or ...

  4. ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction

    cs.LG 2026-05 unverdicted novelty 7.0

    ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.

  5. CURE-OOD: Benchmarking Out-of-Distribution Detection for Survival Prediction

    cs.CV 2026-05 unverdicted novelty 7.0

    CURE-OOD is the first benchmark for evaluating OOD detection in survival prediction under controlled CT acquisition shifts, showing that standard detectors often fail and providing a survival-aware baseline.

  6. Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers

    cs.CV 2026-04 unverdicted novelty 7.0

    CNN classifiers work by holographic superposition and destructive interference in pixel space rather than selecting cleaned features, as proven by a new adjoint inversion framework that also yields a covariance-volume...

  7. Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

    cs.RO 2026-04 unverdicted novelty 7.0

    ACO-MoE employs agent-centric mixture-of-experts to decouple task-relevant features from dynamic visual perturbations in RL, recovering 95.3% of clean performance on the new VDCS benchmark.

  8. Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

    cs.RO 2026-04 unverdicted novelty 7.0

    ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.

  9. Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals

    cs.AI 2026-04 unverdicted novelty 7.0

    Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to reta...

  10. Learning Robustness at Test-Time from a Non-Robust Teacher

    cs.CV 2026-04 unverdicted novelty 7.0

    A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.

  11. MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts

    cs.CL 2026-04 unverdicted novelty 7.0

    MIXAR is the first autoregressive pixel-based language model for eight languages and scripts, with empirical gains on multilingual tasks, robustness to unseen languages, and further improvements when scaled to 0.5B pa...

  12. Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

    cs.CR 2026-04 unverdicted novelty 7.0

    A fine-tuning framework reduces PGD attack success on AdvDA detectors from 100% to 3.2% and MalGuise from 13% to 5.1%, but optimal training strategies differ by threat model and robustness does not transfer across them.

  13. FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

    cs.LG 2026-05 unverdicted novelty 6.0

    Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.

  14. Reinforcing Multimodal Reasoning Against Visual Degradation

    cs.CV 2026-05 unverdicted novelty 6.0

    ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.

  15. MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

    cs.CV 2026-05 unverdicted novelty 6.0

    MedVIGIL introduces a clinician-supervised benchmark showing medical VLMs frequently give fluent answers on broken visual evidence, with top models 14 points below human radiologists on the composite score.

  16. Intermediate Representations are Strong AI-Generated Image Detectors

    cs.CV 2026-05 unverdicted novelty 6.0

    Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.

  17. Latent Denoising Improves Visual Alignment in Large Multimodal Models

    cs.CV 2026-04 unverdicted novelty 6.0

    A latent denoising objective with saliency-aware corruption and contrastive distillation improves visual alignment and corruption robustness in large multimodal models.

  18. Adapting in the Dark: Efficient and Stable Test-Time Adaptation for Black-Box Models

    cs.LG 2026-04 unverdicted novelty 6.0

    BETA adapts black-box models at test time using a local steering model and regularization techniques to achieve accuracy improvements without additional API queries or high latency.

  19. Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings

    cs.LG 2026-04 unverdicted novelty 6.0

    Circuit-based metrics from Vision Transformer internals provide better label-free proxies for generalization under distribution shift than existing methods like model confidence.

  20. Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

    cs.LG 2026-04 unverdicted novelty 6.0

    GenCE is a strictly proper loss obtained by normalizing each sample's softmax against the batch predictions, outperforming cross-entropy in low-data and imbalanced regimes with better calibration and OOD detection.

  21. Dual-axis attribution of zebrafish tectal microcircuits for energy-efficient and robust neurocomputing

    cs.NE 2026-05 conditional novelty 5.0

    Zebrafish tectal subcircuits are dissociated into spike-efficient information gating and feedback-like robustness stabilization, then transferred to improve ResNet efficiency and noise tolerance.

  22. Probing Routing-Conditional Calibration in Attention-Residual Transformers

    cs.CV 2026-05 unverdicted novelty 5.0

    Routing summaries and auxiliary features do not provide stable evidence of conditional miscalibration in AR transformers once confidence-matched baselines, capacity controls, and permutation nulls are applied.

  23. ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs

    cs.CV 2026-05 unverdicted novelty 5.0

    ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.

  24. Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models

    cs.LG 2026-05 unverdicted novelty 5.0

    Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.

  25. VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning

    cs.LG 2026-04 unverdicted novelty 5.0

    VOLTA, consisting of a deep encoder with learnable prototypes plus cross-entropy and post-hoc temperature scaling, matches or exceeds ten UQ baselines in accuracy, achieves lower expected calibration error, and perfor...

  26. Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

    cs.LG 2026-04 unverdicted novelty 5.0

    Generative Cross-Entropy loss improves both accuracy and calibration over standard cross-entropy by augmenting it with a generative p(x|y) term, especially on long-tailed data, and pairs with adaptive temperature scal...

  27. When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

    cs.CV 2026-05 unverdicted novelty 4.0

    Mild rotations and noise significantly increase relation hallucinations in VLMs across models and datasets, with prompt augmentation and preprocessing offering only partial mitigation.

  28. When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

    cs.CV 2026-05 unverdicted novelty 4.0

    Mild rotations and noise significantly increase relation hallucinations in VLMs across models and datasets, with prompt and preprocessing fixes providing only partial relief.

  29. Robust Deepfake Detection, NTIRE 2026 Challenge: Report

    cs.CV 2026-04 unverdicted novelty 2.0

    The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 26 Pith papers

  1. [1]

    Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition

    Ossama Abdel-Hamid, Abdel rahman Mohamed, Hui Jiang, and Gerald Penn. Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. ICASSP, 2013

  2. [2]

    Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint, 2018

    Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint, 2018

  3. [3]

    Measuring neural net robustness with constraints

    Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. Measuring neural net robustness with constraints. In NIPS. 2016

  4. [4]

    A non-local algorithm for image denoising

    Antoni Buades and Bartomeu Coll. A non-local algorithm for image denoising. In CVPR, 2005

  5. [5]

    Defensive distillation is not robust to adversarial examples, 2016

    Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples, 2016

  6. [6]

    Adversarial examples are not easily detected: Bypassing ten detection methods, 2017

    Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods, 2017

  7. [7]

    Nicholas Carlini, Guy Katz, Clark Barrett, and David L. Dill. Ground-truth adversarial examples, 2017

  8. [8]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. CVPR, 2009

  9. [9]

    Quality resilient deep neural networks, 2017 a

    Samuel Dodge and Lina Karam. Quality resilient deep neural networks, 2017 a

  10. [10]

    A study and comparison of human and deep learning recognition performance under visual distortions, 2017 b

    Samuel Dodge and Lina Karam. A study and comparison of human and deep learning recognition performance under visual distortions, 2017 b

  11. [11]

    Ideal spatial adaptation by wavelet shrinkage

    David Donoho and Iain Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 1993

  12. [12]

    Evaluating and understanding the robustness of adversarial logit pairing

    Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understanding the robustness of adversarial logit pairing. arXiv preprint, 2018

  13. [13]

    Robust physical-world attacks on deep learning models, 2017

    Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. Robust physical-world attacks on deep learning models, 2017

  14. [14]

    Interpretable explanations of black boxes by meaningful perturbation

    Ruth Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. ICCV, 2017

  15. [15]

    Image style transfer using convolutional neural networks

    Leon Gatys, Alexander Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. CVPR, 2016

  16. [16]

    Robert Geirhos, David H. J. Janssen, Heiko H. Schütt, Jonas Rauber, Matthias Bethge, and Felix A. Wichmann. Comparing deep neural networks against humans: object recognition when the signal gets weaker, 2017

  17. [17]

    Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness

    Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR, 2019

  18. [18]

    Adams, Ian Goodfellow, David Andersen, and George E

    Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, and George E. Dahl. Motivating the rules of the game for adversarial example research. arXiv preprint, 2018 a

  19. [19]

    Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow

    Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. ICLR Workshop, 2018 b

  20. [20]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. International Conference on Machine Learning, 2017

  21. [21]

    Histogram-based subband powerwarping and spectral averaging for robust speech recognition under matched and multistyle training, 2012

    Mark Harvilla and Richard Stern. Histogram-based subband powerwarping and spectral averaging for robust speech recognition under matched and multistyle training, 2012

  22. [22]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2015

  23. [23]

    Early methods for detecting adversarial images, 2017 a

    Dan Hendrycks and Kevin Gimpel. Early methods for detecting adversarial images, 2017 a

  24. [24]

    A baseline for detecting misclassified and out-of-distribution examples in neural networks

    Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017 b

  25. [25]

    Using trusted data to train deep networks on labels corrupted by severe noise

    Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train deep networks on labels corrupted by severe noise. NIPS, 2018

  26. [26]

    Deep anomaly detection with outlier exposure

    Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. ICLR, 2019

  27. [27]

    Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments, 2007

    Hans-G\" u nter Hirsch. Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments, 2007

  28. [28]

    The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

    Hans-G\" u nter Hirsch and David Pearce. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000, 2000

  29. [29]

    Google's cloud vision api is not robust to noise, 2017

    Hossein Hosseini, Baicen Xiao, and Radha Poovendran. Google's cloud vision api is not robust to noise, 2017

  30. [30]

    Condensenet: An efficient DenseNet using learned group convolutions

    Gao Huang, Shichen Liu, Laurens van der Maaten, and Kilian Q Weinberger. Condensenet: An efficient DenseNet using learned group convolutions. arXiv preprint, 2017 a

  31. [31]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 b

  32. [32]

    Weinberger

    Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q. Weinberger. Multi-scale dense networks for resource efficient image classification. ICLR, 2018

  33. [33]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. JMLR, 2015

  34. [34]

    Adversarial logit pairing

    Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. NIPS, 2018

  35. [35]

    Tsung-Wei Ke, Michael Maire, and Stella X. Yu. Multigrid neural architectures, 2017

  36. [36]

    Chanwoo Kim and Richard M. Stern. Power-normalized cepstral coefficients ( PNCC ) for robust speech recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 24 0 (7): 0 1315--1329, July 2016. ISSN 2329-9290

  37. [37]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012

  38. [38]

    Generalized distances between rankings, 2010

    Ravi Kumar and Sergei Vassilvitskii. Generalized distances between rankings, 2010

  39. [39]

    Adversarial machine learning at scale

    Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR, 2017

  40. [40]

    An overview of noise-robust automatic speech recognition

    Jinyu Li, Li Deng, Yifan Gong, and Reinhold Haeb-Umbach. An overview of noise-robust automatic speech recognition. 2014

  41. [41]

    Stern, Xuedong Huang, and Alex Acero

    Fu-Hua Liu, Richard M. Stern, Xuedong Huang, and Alex Acero. Efficient cepstral normalization for robust speech recognition. In Proc. of DARPA Speech and Natural Language Workshop, 1993

  42. [42]

    Open category detection with PAC guarantees

    Si Liu, Risheek Garrepalli, Thomas Dietterich, Alan Fern, and Dan Hendrycks. Open category detection with PAC guarantees. In ICML, 2018

  43. [43]

    Standard detectors aren't (currently) fooled by physical adversarial stop signs, 2017

    Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth. Standard detectors aren't (currently) fooled by physical adversarial stop signs, 2017

  44. [44]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR, 2018

  45. [45]

    On detecting adversarial perturbations, 2017

    Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations, 2017

  46. [46]

    Vikramjit Mitra, Horacio Franco, Richard Stern, Julien Van Hout, Luciana Ferrer, Martin Graciarena, Wen Wang, Dimitra Vergyri, Abeer Alwan, and John H.L. Hansen. Robust features in deep learning based speech recognition, 2017

  47. [47]

    Feature visualization

    Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Feature visualization. Distill, 2017

  48. [48]

    Distillation as a defense to adversarial perturbations against deep neural networks, 2017

    Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks, 2017

  49. [49]

    Pizer, E

    Stephen M. Pizer, E. Philip Amburn, John D. Austin, Robert Cromartie, Ari Geselowitz, Trey Greer, Bart Ter Haar Romeny, and John B. Zimmerman. Adaptive histogram equalization and its variations. Computer Vision, Graphics, and Image Processing, 1987

  50. [50]

    Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models, 2017

    Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models, 2017

  51. [51]

    Do cifar-10 classifiers generalize to cifar-10? arXiv preprint, 2018

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do cifar-10 classifiers generalize to cifar-10? arXiv preprint, 2018

  52. [52]

    Adversarially robust generalization requires more data

    Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. arXiv preprint, 2018

  53. [53]

    Towards deep learning models resistant to adversarial attacks

    Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. Towards deep learning models resistant to adversarial attacks. arXiv preprint, 2018

  54. [54]

    Certified defenses for data poisoning attacks

    Jacob Steinhardt, Pang Wei Koh, and Percy Liang. Certified defenses for data poisoning attacks. NIPS, 2017

  55. [55]

    Intriguing properties of neural networks, 2014

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks, 2014

  56. [56]

    Traffic signs in the wild: Highlights from the ieee video and image processing cup 2017 student competition

    Dogancan Temel and Ghassan AlRegib. Traffic signs in the wild: Highlights from the ieee video and image processing cup 2017 student competition. IEEE Signal Processing Magazine, 2018

  57. [57]

    Cure-tsr: Challenging unreal and real environments for traffic sign recognition

    Dogancan Temel, Gukyeong Kwon, Mohit Prabhushankar, and Ghassan AlRegib. Cure-tsr: Challenging unreal and real environments for traffic sign recognition. NIPS Workshop, 2017

  58. [58]

    Cure-or: Challenging unreal and real environments for object recognition

    Dogancan Temel, Jinsol Lee, and Ghassan AlRegib. Cure-or: Challenging unreal and real environments for object recognition. ICMLA, 2018

  59. [59]

    Histogram equalization of speech representation for robust speech recognition

    \' A ngel de la Torre , Antonio Peinado, Jos\' e Segura, Jos\' e P\' e rez-C\' o rdoba, Ma Carmen Ben\' i tez, and Antonio Rubio. Histogram equalization of speech representation for robust speech recognition. IEEE Signal Processing Society, 2005

  60. [60]

    Examining the impact of blur on recognition by convolutional networks, 2016

    Igor Vasiljevic, Ayan Chakrabarti, and Gregory Shakhnarovich. Examining the impact of blur on recognition by convolutional networks, 2016

  61. [61]

    Aggregated residual transformations for deep neural networks

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. CVPR, 2016

  62. [62]

    Improving the robustness of deep neural networks via stability training, 2016

    Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training, 2016