pith. sign in

arxiv: 1907.05587 · v1 · pith:ODUDRPJTnew · submitted 2019-07-12 · 💻 cs.CR · cs.LG

Stateful Detection of Black-Box Adversarial Attacks

Pith reviewed 2026-05-24 22:46 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords black-box adversarial attacksstateful defensequery historyadversarial example generationquery blindingremote classifierevasion attacksmachine learning security
0
0 comments X

The pith

Stateful defenses detect black-box adversarial attacks by monitoring sequences of queries to remote services.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that when a machine learning classifier is hosted remotely, defenders can maintain a history of past queries to identify sequences that appear intended to generate adversarial examples. This approach gives defenders more options than the stateless defenses assumed in prior work, which only look at individual queries. The authors develop such a stateful detection method and then introduce query blinding as a way for attackers to try to evade it. If correct, this makes it possible for remote services to spot attack preparation before an adversarial example is submitted.

Core claim

In black-box settings where the classifier is hosted as a remote service, defenders have a much larger space of actions by using stateful defenses that keep a history of past queries to identify sequences for the purpose of generating an adversarial example; the authors develop one such defense and introduce query blinding as a new class of attacks designed to bypass it.

What carries the argument

Stateful query history tracking that identifies sequences of queries aimed at adversarial example generation by distinguishing them from normal usage.

If this is right

  • Remote machine learning services can implement detection of attack preparation across multiple queries instead of checking examples in isolation.
  • Adversaries must develop new techniques such as query blinding to evade stateful detection.
  • The study of adversarial examples moves from stateless classifiers to stateful systems that retain query history.
  • Defenders gain additional responses to black-box adversaries beyond per-query analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Stateful defenses could be paired with rate limiting or other query controls to limit an attacker's ability to test many variations.
  • The same history-tracking idea might apply to detecting other forms of misuse against query-based machine learning APIs.
  • Deployment would require deciding how long to retain query histories and how to handle cases where legitimate users submit similar sequences.

Load-bearing premise

Sequences of queries used to generate adversarial examples will exhibit detectable patterns distinguishable from normal usage even after the introduction of query blinding countermeasures.

What would settle it

An experiment in which every sequence of queries crafted to produce an adversarial example is made indistinguishable from sequences of normal user queries.

Figures

Figures reproduced from arXiv: 1907.05587 by David Wagner, Nicholas Carlini, Steven Chen.

Figure 1
Figure 1. Figure 1: Query Detection Defense: The high-level process for detecting a query-based adversarial attack. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The mean k-neighbor distance (in encoded space) of the 0.1% percentile of the CIFAR-10 training set as a func￾tion of k. We select the threshold k so that it is large enough to support a wide margin, but not so large as to be computa￾tionally prohibitive. for three reasons. First, CIFAR-10 is the most popular dataset for studying adversarial examples [3]. Second, defenses on ImageNet have thus far proven t… view at source ↗
Figure 4
Figure 4. Figure 4: For each image in the test set (left) we show four [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Attack success and detection rate between [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

The problem of adversarial examples, evasion attacks on machine learning classifiers, has proven extremely difficult to solve. This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters. This paper argues that in such settings, defenders have a much larger space of actions than have been previously explored. Specifically, we deviate from the implicit assumption made by prior work that a defense must be a stateless function that operates on individual examples, and explore the possibility for stateful defenses. To begin, we develop a defense designed to detect the process of adversarial example generation. By keeping a history of the past queries, a defender can try to identify when a sequence of queries appears to be for the purpose of generating an adversarial example. We then introduce query blinding, a new class of attacks designed to bypass defenses that rely on such a defense approach. We believe that expanding the study of adversarial examples from stateless classifiers to stateful systems is not only more realistic for many black-box settings, but also gives the defender a much-needed advantage in responding to the adversary.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript argues that in black-box settings where a classifier is hosted as a remote service, defenders can exploit a larger action space by adopting stateful defenses that maintain a history of past queries to identify sequences indicative of adversarial example generation. It contrasts this approach with prior stateless defenses and introduces query blinding as a new attack class explicitly designed to evade history-based detection. The authors conclude that expanding the study to stateful systems is both more realistic and advantageous for defenders.

Significance. If the premise that detectable patterns in query sequences survive blinding countermeasures holds, the work could meaningfully expand the design space for practical defenses in remote ML deployments by moving beyond per-query stateless methods.

major comments (1)
  1. Abstract: the claim that stateful detection yields a defender advantage rests on the unshown premise that query sequences for adversarial example generation remain distinguishable from benign usage even after query blinding; no detection algorithm, similarity metric, threshold, or experimental protocol is described to support this.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments on the manuscript. We address the single major comment below.

read point-by-point responses
  1. Referee: Abstract: the claim that stateful detection yields a defender advantage rests on the unshown premise that query sequences for adversarial example generation remain distinguishable from benign usage even after query blinding; no detection algorithm, similarity metric, threshold, or experimental protocol is described to support this.

    Authors: The abstract is intentionally concise and summarizes the core argument and contributions. The full manuscript (Sections 3–5) defines the stateful detection algorithm, the similarity metric over query histories, the decision threshold, and the experimental protocol (including benign vs. adversarial query sequences). Experiments evaluate distinguishability both before and after query blinding is applied, showing that blinding reduces but does not eliminate detectable patterns, thereby imposing higher query costs on the attacker. We acknowledge that the abstract could more explicitly reference these elements and the nuanced post-blinding results; we will revise it accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: purely conceptual proposal without derivations or self-citations

full rationale

The paper advances a conceptual argument that stateful defenses expand defender options in black-box settings and introduces query blinding as a new attack class. No equations, fitted parameters, predictions, or load-bearing self-citations appear in the provided text. The central claim is presented as a belief rather than a derived result from prior inputs, so no reduction to self-definition or fitted inputs exists. This matches the default case of a self-contained conceptual paper with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5729 in / 843 out tokens · 16842 ms · 2026-05-24T22:46:17.512967+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Benchmarking Misuse Mitigation Against Covert Adversaries

    cs.CR 2025-06 unverdicted novelty 6.0

    Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 1 Pith paper · 17 internal anchors

  1. [1]

    Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Mur...

  2. [2]

    On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

    Anish Athalye and Nicholas Carlini. On the robustness of the CVPR 2018 white- box adversarial example defenses. CoRR, abs/1804.03286, 2018

  3. [3]

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

    Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, July 2018

  4. [4]

    Learning visual similarity for product design with convolutional neural networks

    Sean Bell and Kavita Bala. Learning visual similarity for product design with convolutional neural networks. ACM Trans. on Graphics (SIGGRAPH), 34(4), 2015

  5. [5]

    Evasion Attacks against Machine Learning at Test Time

    Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. CoRR, abs/1708.06131, 2017

  6. [6]

    Decision-based adversarial attacks: Reliable attacks against black-box machine learning models

    Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR, 2018

  7. [7]

    Guess- ing smart: Biased sampling for efficient black-box adversarial attacks

    Thomas Brunner, Frederik Diehl, Michael Truong-Le, and Alois Knoll. Guess- ing smart: Biased sampling for efficient black-box adversarial attacks. CoRR, abs/1812.09803, 2018

  8. [8]

    On Evaluating Adversarial Robustness

    Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, and Aleksander Madry. On evalu- ating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019

  9. [9]

    Adversarial examples are not easily detected: Bypassing ten detection methods

    Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017

  10. [10]

    Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016

  11. [11]

    Blind signatures for untraceable payments

    David Chaum. Blind signatures for untraceable payments. In CRYPTO. Springer, 1983

  12. [12]

    François Chollet et al. Keras. https://keras.io, 2015. Stateful Detection of Black-Box Adversarial Attacks

  13. [13]

    Transforming enterprises with computer vision ai

    Clarifai. Transforming enterprises with computer vision ai. https://clarifai.com/

  14. [14]

    Evaluating and understand- ing the robustness of adversarial logit pairing

    Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understand- ing the robustness of adversarial logit pairing. In NIPS 2018 Machine Learning and Computer Security Workshop, November 2018

  15. [15]

    Detecting Adversarial Samples from Artifacts

    Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. Detecting adversarial samples from artifacts. CoRR, abs/1703.00410, 2017

  16. [16]

    Adversarial Examples Are a Natural Consequence of Test Error in Noise

    Nic Ford, Justin Gilmer, Nicolas Carlini, and Dogus Cubuk. Adversarial examples are a natural consequence of test error in noise. arXiv:1901.10513, 2019

  17. [17]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS. Curran Associates, Inc., 2014

  18. [18]

    Explaining and har- nessing adversarial examples

    Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and har- nessing adversarial examples. In ICLR, 2015

  19. [19]

    Google Cloud Vision AI

    Google. Google Cloud Vision AI. https://cloud.google.com/vision/

  20. [20]

    On the (Statistical) Detection of Adversarial Examples

    Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick D. McDaniel. On the (statistical) detection of adversarial examples. CoRR, abs/1702.06280, 2017

  21. [21]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

  22. [22]

    Black-box adver- sarial attacks with limited queries and information

    Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adver- sarial attacks with limited queries and information. In ICML, July 2018

  23. [23]

    Jordan Jianbo Chen

    Michael I. Jordan Jianbo Chen. Boundary attack++: Query-efficient decision- based adversarial attack. https://arxiv.org/abs/1904.02144, 2019

  24. [24]

    Mika Juuti, Sebastian Szyller, Alexey Dmitrenko, Samuel Marchal, and N. Asokan. PRADA: protecting against DNN model stealing attacks. CoRR, abs/1805.02628, 2018

  25. [25]

    E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

    Markus Kettunen, Erik Härkönen, and Jaakko Lehtinen. E-LPIPS: Ro- bust perceptual image similarity via random transformation ensembles, 2019. arxiv:1906.03973

  26. [26]

    RED-Attack: Resource Efficient Decision based Attack for Machine Learning

    Faiq Khalid, Hassan Ali, Muhammad Abdullah Hanif, Semeen Rehman, Rehan Ahmed, and Muhammad Shafique. RED-attack: Resource efficient decision based attack for machine learning. CoRR, abs/1901.10258, 2019

  27. [27]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

  28. [28]

    CIFAR-10

    Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. CIFAR-10

  29. [29]

    A geometry-inspired decision-based attack

    Yujia Liu, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. A geometry- inspired decision-based attack. CoRR, abs/1903.10826, 2019

  30. [30]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

  31. [31]

    On detecting adversarial perturbations

    Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In ICLR, 2017

  32. [32]

    Practical Black-Box Attacks against Machine Learning

    Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016

  33. [33]

    YOLO9000: better, faster, stronger

    Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In CVPR, 2017

  34. [34]

    Sitatapatra: Blocking the transfer of adversarial samples.CoRR, abs/1901.08121, 2019

    Ilia Shumailov, Xitong Gao, Yiren Zhao, Robert Mullins, Ross Anderson, and Cheng-Zhong Xu. Sitatapatra: Blocking the transfer of adversarial samples.CoRR, abs/1901.08121, 2019

  35. [35]

    David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan- neershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Masteri...

  36. [36]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014

  37. [37]

    Ensemble adversarial training: Attacks and defenses

    Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018

  38. [38]

    Reiter, and Thomas Ristenpart

    Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In USENIX Security, 2016

  39. [39]

    Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

    Jonathan Uesato, Brendan O’Donoghue, Aäron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks.CoRR, abs/1802.05666, 2018

  40. [40]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS 2017, 2017

  41. [41]

    Mimicry attacks on host-based intrusion detection systems

    David Wagner and Paolo Soto. Mimicry attacks on host-based intrusion detection systems. In CCS. ACM, 2002

  42. [42]

    Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

    Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017

  43. [43]

    Wide Residual Networks

    Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. CoRR, abs/1605.07146, 2016. A CNN ARCHITECTURE Layer Filters Size Details Conv 32 3 × 3 ReLU Conv 32 3 × 3 ReLU Max Pool 2 × 2 stride = 2 Dropout p = 0.25 Conv 64 3 × 3 ReLU Conv 64 3 × 3 ReLU Max Pool 2 × 2 stride = 2 Dropout p = 0.25 Dense 512 ReLU Dropout p = 0.5 Dense 256 Table 6: Architect...

  44. [44]

    hard-label

    withϵ= 0.05. The network was trained for 100 epochs, where the adversarial examples for each epoch were generated from a ran- domly selected model from the ensemble and the defended model, and one adversarial example was generated per image in the CIFAR- 10 training set. D A DIFFICULT SOFT-LABEL CASE Our paper focuses on the “hard-label” threat model wher...