pith. sign in

arxiv: 2001.03994 · v1 · pith:OAEHB3YQnew · submitted 2020-01-12 · 💻 cs.LG · stat.ML

Fast is better than free: Revisiting adversarial training

classification 💻 cs.LG stat.ML
keywords trainingadversarialrobustmethodfastfgsmaccuracyclassifier
0
0 comments X
read the original abstract

Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $\epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $\epsilon=2/255$ in 12 hours, in comparison to past work based on "free" adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks

    cs.CR 2026-01 unverdicted novelty 8.0

    FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.

  2. Learning Robustness at Test-Time from a Non-Robust Teacher

    cs.CV 2026-04 unverdicted novelty 7.0

    A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.

  3. FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods

    cs.CV 2026-04 conditional novelty 6.0

    The FastAT Benchmark standardizes evaluation of over twenty fast adversarial training methods under unified conditions, showing that well-designed single-step approaches can match or exceed PGD-AT robustness at lower ...

  4. Representation-Guided Parameter-Efficient LLM Unlearning

    cs.CL 2026-04 unverdicted novelty 6.0

    REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.

  5. GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees

    cs.LG 2026-04 unverdicted novelty 6.0

    GF-Score decomposes certified robustness into per-class profiles, adds fairness metrics like disparity index and Gini coefficient, and uses self-calibration on clean accuracy to avoid adversarial attacks.

  6. Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models

    quant-ph 2026-04 unverdicted novelty 6.0

    Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.

  7. Compression as an Adversarial Amplifier Through Decision Space Reduction

    cs.CV 2026-04 unverdicted novelty 6.0

    Compression acts as an adversarial amplifier by reducing the decision space of image classifiers, making attacks in compressed representations substantially more effective than pixel-space attacks under the same pertu...

  8. SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

    cs.LG 2023-10 accept novelty 6.0

    SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.

  9. A combination of noise and bilateral filters achieve supralinear and scalable adversarial robustness in CNNs

    cs.LG 2026-06 unverdicted novelty 5.0

    A preprocessor of Gaussian noise plus bilateral filtering yields supralinear adversarial robustness in CNNs and, when paired with adversarial training, ranks near the top of RobustBench while using far less compute, p...

  10. Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless $l^p$ Norm Solution for Fast Adversarial Training

    cs.LG 2025-05 unverdicted novelty 5.0

    An adaptive l^p norm control in FGSM adversarial training, guided by participation ratio and entropy of gradients, mitigates catastrophic overfitting without noise or regularization.

  11. Robust Auto-associative Memory via Convolutional Restricted Hopfield Networks

    cs.NE 2026-06 unverdicted novelty 4.0

    CRHNs integrate convolutional extraction with subspace attractor retrieval trained via Subspace Rotation Algorithm and report order-of-magnitude lower reconstruction error than MHNs and PCNs on STL data under adversar...