Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Pouya Samangouei, Maya Kabkab, Rama Chellappa · 2018 · cs.CV · arXiv 1805.06605

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://github.com/kabkabm/defensegan

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Low Rank Adaptation for Adversarial Perturbation

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.

Stain-Aware Wavelet Regularization for Instant Adversarial Purification in Histopathology

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

SAWR applies stain-aware multi-level wavelet regularization to purify adversarial perturbations in histopathology images, claiming up to 10.69% robustness gain while preserving texture and spectral properties.

Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders

quant-ph · 2026-04-30 · unverdicted · novelty 6.0

A quantum autoencoder purifies adversarial perturbations for quantum classifiers and supplies a confidence score for unrecoverable inputs, claiming up to 68% accuracy gains over prior defenses without adversarial training.

Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models

quant-ph · 2026-04-09 · unverdicted · novelty 6.0

Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

cs.LG · 2023-09-01 · conditional · novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.

Latent Adversarial Defence with Boundary-guided Generation

cs.LG · 2019-07-16 · unverdicted · novelty 5.0

LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.

Affine Disentangled GAN for Interpretable and Robust AV Perception

cs.CV · 2019-07-06 · unverdicted · novelty 5.0

ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.

Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approximate-gradient methods.

Using Intuition from Empirical Properties to Simplify Adversarial Training Defense

cs.LG · 2019-06-27 · unverdicted · novelty 4.0

Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.

Enabling Adversarial Robustness in AI Models through Kubeflow MLOps

cs.CR · 2026-05-14 · unverdicted · novelty 3.0

A Kubeflow-based MLOps architecture detects FGSM adversarial attacks on deployed AI models and automatically applies PGD-based adversarial training to recover accuracy.

citing papers explorer

Showing 10 of 10 citing papers.

Low Rank Adaptation for Adversarial Perturbation cs.LG · 2026-04-30 · unverdicted · none · ref 80
Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
Stain-Aware Wavelet Regularization for Instant Adversarial Purification in Histopathology cs.CV · 2026-06-07 · unverdicted · none · ref 22 · internal anchor
SAWR applies stain-aware multi-level wavelet regularization to purify adversarial perturbations in histopathology images, claiming up to 10.69% robustness gain while preserving texture and spectral properties.
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders quant-ph · 2026-04-30 · unverdicted · none · ref 20
A quantum autoencoder purifies adversarial perturbations for quantum classifiers and supplies a confidence score for unrecoverable inputs, claiming up to 68% accuracy gains over prior defenses without adversarial training.
Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models quant-ph · 2026-04-09 · unverdicted · none · ref 21
Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models cs.LG · 2023-09-01 · conditional · none · ref 48
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Latent Adversarial Defence with Boundary-guided Generation cs.LG · 2019-07-16 · unverdicted · none · ref 22 · internal anchor
LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.
Affine Disentangled GAN for Interpretable and Robust AV Perception cs.CV · 2019-07-06 · unverdicted · none · ref 24 · internal anchor
ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations cs.LG · 2026-05-07 · unverdicted · none · ref 41
MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approximate-gradient methods.
Using Intuition from Empirical Properties to Simplify Adversarial Training Defense cs.LG · 2019-06-27 · unverdicted · none · ref 12 · internal anchor
Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.
Enabling Adversarial Robustness in AI Models through Kubeflow MLOps cs.CR · 2026-05-14 · unverdicted · none · ref 24 · internal anchor
A Kubeflow-based MLOps architecture detects FGSM adversarial attacks on deployed AI models and automatically applies PGD-based adversarial training to recover accuracy.

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer