Delving into Transferable Adversarial Examples and Black-box Attacks

Chang Liu; Dawn Song; Xinyun Chen; Yanpei Liu

arxiv: 1611.02770 · v3 · pith:Y55GUQUYnew · submitted 2016-11-08 · 💻 cs.LG

Delving into Transferable Adversarial Examples and Black-box Attacks

Yanpei Liu , Xinyun Chen , Chang Liu , Dawn Song This is my paper

classification 💻 cs.LG

keywords adversarialexamplestransferableapproachestargetedfirstlabelslarge

0 comments

read the original abstract

An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks
cs.CR 2026-01 unverdicted novelty 8.0

FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.
Toy Models of Superposition
cs.LG 2022-09 accept novelty 8.0

Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulne...
Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation
cs.LG 2026-05 unverdicted novelty 7.0

LHSD uses spectral filtering on the log-density Hessian to isolate tangent directions from noise and estimate local intrinsic dimension scalably via Stochastic Lanczos Quadrature.
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
cs.CV 2024-06 unverdicted novelty 7.0

MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples
cs.NE 2022-09 unverdicted novelty 7.0

MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained S...
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
cs.CV 2025-12 conditional novelty 6.0

SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition
cs.LG 2024-04 unverdicted novelty 6.0

Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attr...
Jailbreaking Black Box Large Language Models in Twenty Queries
cs.LG 2023-10 conditional novelty 6.0

PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.
Fooling a Real Car with Adversarial Traffic Signs
cs.CR 2019-06 unverdicted novelty 6.0

A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations
cs.CV 2019-06 unverdicted novelty 6.0

Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework
cs.CV 2026-05 unverdicted novelty 5.0

JMOF is a new optimization framework for physical adversarial attacks that improves cross-model transferability and enables simultaneous attacks on multiple vision tasks such as object detection and semantic segmentation.
Laundering AI Authority with Adversarial Examples
cs.CR 2026-05 unverdicted novelty 5.0

Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.
Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning
cs.LG 2019-07 unverdicted novelty 5.0

Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.
Cellular State Transformations using Generative Adversarial Networks
q-bio.QM 2019-06 unverdicted novelty 5.0

TSPG applies conditional GANs to generate realistic transcriptome perturbations that mimic source-to-target gene expression state transitions and highlight biologically enriched genes.
Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models
cs.CV 2026-04 unverdicted novelty 4.0

Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
cs.LG 2026-05 accept novelty 3.0

NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.