pith. sign in

super hub Canonical reference

Explaining and Harnessing Adversarial Examples

Canonical reference. 80% of citing Pith papers cite this work as background.

226 Pith papers citing it
Background 80% of classified citations
abstract

Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.

hub tools

citation-role summary

background 36 method 4

citation-polarity summary

claims ledger

  • abstract Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving

authors

co-cited works

representative citing papers

Adversarially Robust Approximate Furthest Neighbor

cs.DS · 2026-05-15 · unverdicted · novelty 8.0

First adversarially robust data structure for c-approximate furthest neighbor search with query time matching the best known oblivious results for many parameter regimes.

Dataset Distillation

cs.LG · 2018-11-27 · unverdicted · novelty 8.0

Dataset distillation creates a tiny synthetic training set that, when used with a fixed network initialization, produces models whose performance approximates that of models trained on the full original dataset.

RogueMerge: Robust and Unified Attacks against LLM Model Merging

cs.CR · 2026-06-02 · unverdicted · novelty 7.0

RogueMerge is a unified attack method that jointly optimizes task vectors to succeed after merging, using stochastic min-max simulation for unknown merging settings and a Taylor-approximated DRO for prompt generalization on generative LLMs.

Quantitative Linear Logic for Neuro-Symbolic Learning and Verification

cs.LO · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

QLL is a novel logic for neuro-symbolic learning that uses ML-native operations (sum, log-sum-exp) on logits to embed constraints, satisfying most linear logic properties and showing stronger correlation between empirical robustness and formal verification than prior approaches.

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Presents first online L2D algorithm for multiclass classification with bandit feedback and varying experts, achieving O((n+n_e)T^{2/3}) regret generally and O((n+n_e)√T) under low noise.

citing papers explorer

Showing 50 of 226 citing papers.