pith. sign in

arxiv: 1907.05274 · v1 · pith:DJBUMAFXnew · submitted 2019-07-06 · 💻 cs.CV · cs.LG· eess.IV

Affine Disentangled GAN for Interpretable and Robust AV Perception

Pith reviewed 2026-05-25 01:53 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IV
keywords affine disentangled GANADIS-GANadversarial robustnessaffine transformationsautonomous vehicle perceptionMNIST classificationGAN interpretabilityrobust perception
0
0 comments X

The pith

ADIS-GAN disentangles affine factors to gain simultaneous robustness against geometric transformations and adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Affine Disentangled GAN (ADIS-GAN) as a method to make image classification robust for autonomous vehicle perception. It separates parameters for affine changes such as rotation and scaling from the image content inside the generative model. This separation allows the system to maintain accuracy when inputs undergo rotations or receive adversarial perturbations, even though standard data augmentation techniques for the two problems do not reinforce each other. The model also produces the actual rotation angle and scaling factor as explicit outputs. Results on MNIST show accuracy above 98 percent for rotations up to 30 degrees and above 90 percent under FGSM and PGD attacks.

Core claim

By disentangling affine transformation parameters within the GAN generator, ADIS-GAN produces a perception model that resists both affine transformations and adversarial attacks at the same time, while conventional affine augmentation and adversarial training remain orthogonal; the architecture additionally yields interpretable outputs for rotation angle and scaling factor, demonstrated on MNIST with the stated accuracy levels.

What carries the argument

Affine Disentangled GAN (ADIS-GAN), a generative architecture that isolates affine factors from image content to confer joint robustness.

If this is right

  • The classifier maintains over 98 percent accuracy on images rotated within 30 degrees without dedicated rotation augmentation.
  • The same model reaches over 90 percent accuracy against both FGSM and PGD adversarial attacks.
  • Standard data augmentation pipelines for affine changes and for adversarial examples cannot be combined effectively.
  • Rotation angle and scaling factor become directly available as model outputs for downstream use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same disentanglement principle could be tested on translation or shear parameters in addition to rotation and scaling.
  • If the orthogonality result holds, perception pipelines would need fewer separate defense modules rather than stacking augmentation techniques.
  • Extension beyond MNIST to datasets with real-world scenes would show whether the dual robustness persists when background content varies more.

Load-bearing premise

Disentangling affine factors inside the GAN will automatically confer robustness to adversarial attacks without trade-offs or separate adversarial training steps.

What would settle it

Training the ADIS-GAN on affine-disentangled data only and then measuring accuracy drop on FGSM or PGD adversarial examples; a sharp fall below 90 percent would falsify the joint-robustness claim.

Figures

Figures reproduced from arXiv: 1907.05274 by Justin Dauwels, Letao Liu, Martin Saerbeck.

Figure 1
Figure 1. Figure 1: An adversarial attack on a car image. Left: without adversarial attack, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: When the vehicle hits a water puddle, the images captured by [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Rotation of vehicle images. The vehicle tends to be detected incorrectly [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Model architecture. Diamond boxes are affine regularizers derived in section A. Rectangle boxes are variables. Ellipse boxes are neural networks. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: We purposely rotate the images from MNIST test dataset from -30 [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Images change with rotation latent vectors. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Images change with horizontal zoom latent vectors. [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗
read the original abstract

Autonomous vehicles (AV) have progressed rapidly with the advancements in computer vision algorithms. The deep convolutional neural network as the main contributor to this advancement has boosted the classification accuracy dramatically. However, the discovery of adversarial examples reveals the generalization gap between dataset and the real world. Furthermore, affine transformations may also confuse computer vision based object detectors. The degradation of the perception system is undesirable for safety critical systems such as autonomous vehicles. In this paper, a deep learning system is proposed: Affine Disentangled GAN (ADIS-GAN), which is robust against affine transformations and adversarial attacks. It is demonstrated that conventional data augmentation for affine transformation and adversarial attacks are orthogonal, while ADIS-GAN can handle both attacks at the same time. Useful information such as image rotation angle and scaling factor are also generated in ADIS-GAN. On MNIST dataset, ADIS-GAN can achieve over 98 percent classification accuracy within 30 degrees rotation, and over 90 percent classification accuracy against FGSM and PGD adversarial attack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes Affine Disentangled GAN (ADIS-GAN) for robust and interpretable AV perception. It claims that conventional data augmentation for affine transformations and adversarial attacks are orthogonal, while ADIS-GAN simultaneously handles both via affine disentanglement in the GAN; the model also outputs rotation angle and scaling factor. On MNIST, it reports over 98% classification accuracy for rotations within 30 degrees and over 90% accuracy against FGSM and PGD attacks.

Significance. If the empirical claims hold with proper validation, the result would be significant for safety-critical AV systems, as it suggests a single architecture can address two distinct robustness challenges (affine and adversarial) without trade-offs while adding interpretability through generated affine parameters.

major comments (1)
  1. [Abstract] Abstract: the abstract states specific accuracy numbers (over 98% within 30 degrees rotation, over 90% against FGSM and PGD) and a claim about orthogonality of conventional augmentations but supplies no information on model architecture details, training procedure, baselines, statistical tests, error bars, or dataset splits. The central empirical claims cannot be assessed for support from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the abstract states specific accuracy numbers (over 98% within 30 degrees rotation, over 90% against FGSM and PGD) and a claim about orthogonality of conventional augmentations but supplies no information on model architecture details, training procedure, baselines, statistical tests, error bars, or dataset splits. The central empirical claims cannot be assessed for support from the provided text.

    Authors: We agree the abstract is concise and omits methodological specifics due to length limits. The full manuscript details the ADIS-GAN architecture (Section 3), training procedure and hyperparameters (Section 4), baseline comparisons (Section 5), and standard MNIST splits. The orthogonality claim is supported by experiments contrasting standard augmentation against adversarial robustness. We will revise the abstract to briefly note the disentanglement mechanism and add error bars plus statistical tests to the results in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical GAN architecture (ADIS-GAN) and reports experimental classification accuracies on MNIST under affine transformations and adversarial attacks. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided abstract or claims. The reported results (e.g., >98% accuracy within 30° rotation) are independent performance measurements rather than quantities defined in terms of the model's own inputs or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on specific free parameters, axioms, or invented entities. The ADIS-GAN architecture itself is presented as the novel contribution but without technical details that would allow identification of fitted values or background assumptions.

pith-pipeline@v0.9.0 · 5714 in / 1264 out tokens · 26181 ms · 2026-05-25T01:53:40.773088+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 20 internal anchors

  1. [1]

    Synthesizing Robust Adversarial Examples

    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. CoRR, abs/1707.07397, 2017

  2. [2]

    Nicholas Carlini and David A. Wagner. Towards evaluating the robust- ness of neural networks. CoRR, abs/1608.04644, 2016

  3. [3]

    InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

    Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by infor- mation maximizing generative adversarial nets. CoRR, abs/1606.03657, 2016

  4. [4]

    Adversarial Feature Learning

    Jeff Donahue, Philipp Kr ¨ahenb¨uhl, and Trevor Darrell. Adversarial feature learning. CoRR, abs/1605.09782, 2016

  5. [5]

    A rotation and a translation suffice: Fooling cnns with simple transformations

    Logan Engstrom, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. A rotation and a translation suffice: Fooling cnns with simple transformations. CoRR, abs/1712.02779, 2017

  6. [6]

    Deep symmetry networks

    Robert Gens and Pedro M Domingos. Deep symmetry networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 27, pages 2537–2545. Curran Associates, Inc., 2014

  7. [7]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural In- formation Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014

  8. [8]

    Explaining and Harnessing Adversarial Examples

    Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014

  9. [9]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

  10. [10]

    Towards a Definition of Disentangled Representations

    Irina Higgins, David Amos, David Pfau, S ´ebastien Racani `ere, Lo ¨ıc Matthey, Danilo J. Rezende, and Alexander Lerchner. Towards a definition of disentangled representations. CoRR, abs/1812.02230, 2018

  11. [11]

    Hinton, Alex Krizhevsky, and Sida D

    Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. Transforming auto-encoders. In Proceedings of the 21th International Conference on Artificial Neural Networks - Volume Part I , ICANN’11, pages 44–51, Berlin, Heidelberg, 2011. Springer-Verlag

  12. [12]

    Inferencing Based on Unsupervised Learning of Disentangled Representations

    Tobias Hinz and Stefan Wermter. Inferencing based on unsupervised learning of disentangled representations. CoRR, abs/1803.02627, 2018

  13. [13]

    Spatial Transformer Networks

    Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks. CoRR, abs/1506.02025, 2015

  14. [14]

    Angjoo Kanazawa, Abhishek Sharma, and David W. Jacobs. Locally scale-invariant convolutional neural networks. CoRR, abs/1412.5104, 2014

  15. [15]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 , NIPS’12, pages 1097–1105, USA, 2012. Curran Associates Inc

  16. [16]

    Adversarial examples in the physical world

    Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016

  17. [17]

    MNIST handwritten digit database

    Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010

  18. [18]

    Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

    Francesco Locatello, Stefan Bauer, Mario Lucic, Sylvain Gelly, Bernhard Sch¨olkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. CoRR, abs/1811.12359, 2018

  19. [19]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. CoRR, abs/1706.06083, 2017

  20. [20]

    DeepFool: a simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015

  21. [21]

    Practical Black-Box Attacks against Machine Learning

    Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016

  22. [22]

    Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

    Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015

  23. [23]

    Foolbox: A Python toolbox to benchmark the robustness of machine learning models

    Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models. CoRR, abs/1707.04131, 2017

  24. [24]

    Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

    Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. CoRR, abs/1805.06605, 2018

  25. [25]

    Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In ACM Conference on Computer and Communications Security, pages 1528–1540. ACM, 2016

  26. [26]

    Learning invariant representations with local transformations

    Kihyuk Sohn and Honglak Lee. Learning invariant representations with local transformations. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, pages 1339–1346, USA, 2012. Omnipress

  27. [27]

    Rethinking the Inception Architecture for Computer Vision

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015

  28. [28]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013

  29. [29]

    Tram \`e r, A

    Florian Tram `er, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. CoRR, abs/1705.07204, 2017. APPENDIX A TWO AFFINE TRANSFORMATION ORDERS In principle, there are 6 sequences of affine transformations (R for rotation, K for skew, Z for zoom): • RKZ - 1, • RZK - 1, • KRZ - 2, • KZR - ...