pith. sign in

arxiv: 2506.08514 · v4 · submitted 2025-06-10 · 💻 cs.LG

DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks

Pith reviewed 2026-05-19 11:04 UTC · model grok-4.3

classification 💻 cs.LG
keywords class activation mappingGradCAMadversarial robustnesspassive foolingsaliency mapsexplainable AIconvolutional neural networkslogit differences
0
0 comments X

The pith

DiffGradCAM generates activation maps from logit differences to resist passive fooling while matching standard GradCAM outputs on normal data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard class activation maps like GradCAM rely on individual class logits, which can be manipulated by adversarial training that does not change the model's actual predictions. This paper shows that using differences between logits captures the decision boundary more robustly because softmax probabilities depend on those differences. DiffGradCAM and DiffGradCAM++ implement this idea and remain accurate even against advanced passive fooling techniques called SHAMs. The approach works across tasks with varying numbers of classes and provides a new way to test explanation robustness.

Core claim

The paper establishes that computing class activation maps using gradients with respect to the difference between the target class logit and other logits produces explanations that are immune to passive fooling attacks. These attacks train the model to output misleading activation maps without degrading classification accuracy. In the absence of such attacks, the new maps coincide exactly with those from GradCAM and GradCAM++. The method is validated by introducing SHAMs, an entropy-aware passive fooling technique, and testing on multi-class problems.

What carries the argument

DiffGradCAM, which derives activation maps from the gradient of the logit difference between the predicted class and alternatives rather than from the single target logit.

If this is right

  • The maps remain unchanged under passive fooling that affects standard methods.
  • Higher-order version DiffGradCAM++ inherits the same robustness.
  • SHAMs provide a stricter test for saliency map robustness than prior fooling methods.
  • The technique applies to both few-class and many-class classification tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This contrastive view might generalize to other gradient-based explainers in machine learning.
  • Models trained with this in mind could produce inherently more trustworthy explanations.
  • Future work could explore whether similar difference-based adjustments help against active adversarial attacks on explanations.

Load-bearing premise

That basing the map on logit differences captures the full decision without missing key information or creating new vulnerabilities when the input distribution changes.

What would settle it

An experiment showing that DiffGradCAM produces misleading maps under a passive fooling attack specifically designed to target logit differences, or that it deviates from GradCAM outputs on clean data.

Figures

Figures reproduced from arXiv: 2506.08514 by Adam Czajka, Chris Sweet, Jacob Piland.

Figure 1
Figure 1. Figure 1: Adversarial SHAM saliency used as saliency in training and fine-tuning for producing [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative analysis of the CAM types examined in this paper on datasets with small and [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Class Activation Mapping (CAM) and its gradient-based variants (e.g., GradCAM) have become standard tools for explaining Convolutional Neural Network (CNN) predictions. However, these approaches typically focus on individual logits, while for neural networks using softmax, the class membership probability estimates depend only on the differences between logits, not on their absolute values. This disconnect leaves standard CAMs vulnerable to adversarial manipulation, such as passive fooling, where a model is trained to produce misleading CAMs without affecting decision performance. To address this vulnerability, we propose DiffGradCAM and its higher-order derivative version DiffGradCAM++, as novel, lightweight, contrastive approaches to class activation mapping that are not susceptible to passive fooling and match the output of standard methods such as GradCAM and GradCAM++ in the non-adversarial case. To test our claims, we introduce Salience-Hoax Activation Maps (SHAMs), a more advanced, entropy-aware form of passive fooling that serves as a benchmark for CAM robustness under adversarial conditions. Together, SHAM and DiffGradCAM establish a new framework for probing and improving the robustness of saliency-based explanations. We validate both contributions across multi-class tasks with few and many classes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DiffGradCAM and DiffGradCAM++ as contrastive class activation mapping methods that compute gradients from logit differences (rather than single-class logits) to eliminate vulnerability to passive fooling attacks while reproducing the outputs of GradCAM/GradCAM++ on clean data. It introduces the SHAM benchmark (an entropy-aware passive-fooling procedure) to evaluate robustness and reports validation on multi-class tasks with varying numbers of classes.

Significance. If the central claim holds, the work would supply a lightweight, parameter-free modification that closes a documented gap between CAM explanations and the softmax decision rule, together with a new benchmark for testing explanation robustness. The absence of any quantitative tables, error bars, or dataset statistics in the provided text, however, prevents assessment of whether the method actually achieves the claimed invariance or merely shifts the attack surface.

major comments (2)
  1. [Abstract, §3] Abstract and §3: the assertion that DiffGradCAM is 'not susceptible to passive fooling' by construction rests on the observation that decisions depend only on logit differences. No formal argument or counter-example analysis is supplied showing that an adversary cannot optimize a loss directly against the difference-based gradient field while keeping classification loss near zero; the SHAM benchmark is described only for standard GradCAM.
  2. [Abstract] Abstract: the claim that DiffGradCAM 'matches the output of standard methods such as GradCAM and GradCAM++ in the non-adversarial case' is stated without any quantitative metric, dataset size, or statistical test. This leaves the equivalence claim unverifiable from the text.
minor comments (2)
  1. [§3] Notation for the difference operator (e.g., y_c − y_k) should be defined explicitly before its first use in the method section.
  2. [§4] The description of SHAM should include the precise entropy term and the optimization objective so that the benchmark can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3: the assertion that DiffGradCAM is 'not susceptible to passive fooling' by construction rests on the observation that decisions depend only on logit differences. No formal argument or counter-example analysis is supplied showing that an adversary cannot optimize a loss directly against the difference-based gradient field while keeping classification loss near zero; the SHAM benchmark is described only for standard GradCAM.

    Authors: We appreciate the referee's point that a purely construction-based claim requires additional support. The central design choice in DiffGradCAM is to replace the single-logit gradient with the gradient of a logit difference (target logit minus the strongest competing logit), which directly mirrors the quantity that determines the softmax decision. Because any passive-fooling objective that alters this difference gradient must, by definition, change the relative logit values that govern classification, we believe the attack surface is closed; however, we concede that an explicit proof or attempted counter-example was not supplied. In the revised manuscript we will add a short formal argument in §3 showing that the classification loss and the difference-based explanation loss share the same critical points with respect to logit perturbations. We will also extend the SHAM benchmark description and evaluation protocol to DiffGradCAM/DiffGradCAM++ and report the resulting robustness metrics. revision: yes

  2. Referee: [Abstract] Abstract: the claim that DiffGradCAM 'matches the output of standard methods such as GradCAM and GradCAM++ in the non-adversarial case' is stated without any quantitative metric, dataset size, or statistical test. This leaves the equivalence claim unverifiable from the text.

    Authors: We agree that the equivalence statement would be more convincing with quantitative backing. Although the manuscript already contains qualitative side-by-side visualizations, we will revise the abstract and add a concise quantitative comparison subsection. This will report average Pearson correlation and SSIM values between DiffGradCAM and GradCAM maps computed on clean validation images, together with the exact dataset sizes, number of classes, and basic statistical tests. Error bars from repeated runs will also be included to address the broader concern about missing quantitative tables and statistics. revision: yes

Circularity Check

0 steps flagged

Core DiffGradCAM construction follows directly from softmax logit-difference property; no load-bearing self-citation or fitted prediction

full rationale

The paper's central move—replacing single-logit gradients with differences y_c - y_k—rests on the standard mathematical fact that softmax probabilities are invariant to additive constants and depend only on logit differences. This is an external property of the softmax function, not a result derived or fitted inside the paper. No equations reduce the claimed immunity to a self-referential definition, no parameters are fitted on a subset and then relabeled as predictions, and the SHAM benchmark is introduced as an independent test rather than a tautological re-use of the same data or prior self-citation. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the new method name and benchmark; full paper would be needed to audit these.

pith-pipeline@v0.9.0 · 5756 in / 1094 out tokens · 25458 ms · 2026-05-19T11:04:11.610393+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks

    eess.IV 2026-05 unverdicted novelty 5.0

    CAMAL adds an auxiliary regularizer during training that aligns model attention with segmentation masks to improve both spatial accuracy and causal faithfulness of attention in deep learning and deep reinforcement lea...

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Billingsley

    P. Billingsley. Probability and Measure. 3rd ed. Wiley, 1995

  2. [2]

    Human-aided saliency maps improve generalization of deep learning

    Aidan Boyd, Kevin W Bowyer, and Adam Czajka. “Human-aided saliency maps improve generalization of deep learning”. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022, pp. 2735–2744

  3. [3]

    Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks

    Aditya Chattopadhay et al. “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks”. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE. 2018, pp. 839–847

  4. [4]

    Accessed: 03-12-2021

    Chinese academy of sciences institute of automation . Accessed: 03-12-2021. URL: http: //www.cbsr.ia.ac.cn/china/Iris%20Databases%20CH.asp

  5. [5]

    Explanations can be manipulated and geometry is to blame

    Ann-Kathrin Dombrowski et al. “Explanations can be manipulated and geometry is to blame”. In: Advances in Neural Information Processing Systems . Ed. by H. Wallach et al. V ol. 32. Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper_files/ paper/2019/file/bb836c01cdc9120a9c984c525e4b1a4a-Paper.pdf

  6. [6]

    Use hirescam ins tead of grad-cam for faithful explanations of convolu- tional neural networks

    Rachel Lea Draelos and Lawrence Carin. “Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks”. In:arXiv preprint arXiv:2011.08891 (2020)

  7. [7]

    Axiom-based grad-cam: Towards accurate visualization and explanation of cnns

    Ruigang Fu et al. “Axiom-based grad-cam: Towards accurate visualization and explanation of cnns”. In: arXiv preprint arXiv:2008.02312 (2020)

  8. [8]

    Iris liveness detection based on quality related features

    Javier Galbally et al. “Iris liveness detection based on quality related features”. In:2012 5th IAPR International Conference on Biometrics (ICB). IEEE. 2012, pp. 271–276

  9. [9]

    Fooling neural network interpretations via adversarial model manipulation

    Juyeon Heo, Sunghwan Joo, and Taesup Moon. “Fooling neural network interpretations via adversarial model manipulation”. In: Advances in neural information processing systems 32 (2019)

  10. [10]

    Sur les fonctions convexes et les inégalités entre les valeurs moyennes

    J. L. W. V . Jensen. “Sur les fonctions convexes et les inégalités entre les valeurs moyennes”. In: Acta Mathematica 30 (1906), pp. 175–193

  11. [11]

    Detecting medley of iris spoofing attacks using DESIST

    Naman Kohli et al. “Detecting medley of iris spoofing attacks using DESIST”. In:2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE. 2016, pp. 1–6

  12. [12]

    Revisiting iris recognition with color cosmetic contact lenses

    Naman Kohli et al. “Revisiting iris recognition with color cosmetic contact lenses”. In:2013 International Conference on Biometrics (ICB). IEEE. 2013, pp. 1–7

  13. [13]

    M. R. Leadbetter, G. Lindgren, and H. Rootzén. Extremes and Related Properties of Random Sequences and Processes. Springer Series in Statistics. Reprint of the 1983 edition. Springer, 2012

  14. [14]

    Multifeature-based fake iris detection method

    Sung Joo Lee et al. “Multifeature-based fake iris detection method”. In: Optical Engineering 46.12 (2007), pp. 127204–127204

  15. [15]

    Eigen-cam: Class activation map using principal components

    Mohammed Bany Muhammad and Mohammed Yeasin. “Eigen-cam: Class activation map using principal components”. In: 2020 international joint conference on neural networks (IJCNN). IEEE. 2020, pp. 1–7

  16. [16]

    Model Focus Improves Performance of Deep Learning-Based Synthetic Face Detectors

    Jacob Piland, Adam Czajka, and Christopher Sweet. “Model Focus Improves Performance of Deep Learning-Based Synthetic Face Detectors”. In: IEEE Access (2023)

  17. [17]

    Ablation-cam: Visual explanations for deep convo- lutional network via gradient-free localization

    Harish Guruprasad Ramaswamy et al. “Ablation-cam: Visual explanations for deep convo- lutional network via gradient-free localization”. In: proceedings of the IEEE/CVF winter conference on applications of computer vision. 2020, pp. 983–991

  18. [18]

    Eye movement-driven defense against iris print- attacks

    Ioannis Rigas and Oleg V Komogortsev. “Eye movement-driven defense against iris print- attacks”. In: Pattern Recognition Letters 68 (2015), pp. 316–326

  19. [19]

    ImageNet Large Scale Visual Recognition Challenge

    Olga Russakovsky et al. “ImageNet Large Scale Visual Recognition Challenge”. In: CoRR abs/1409.0575 (2014). arXiv: 1409.0575. URL: http://arxiv.org/abs/1409.0575

  20. [20]

    Improving the Interpretability of GradCAMs in Deep Classification Networks

    Alfred Schöttl. “Improving the Interpretability of GradCAMs in Deep Classification Networks”. In: Procedia Computer Science 200 (2022). 3rd International Conference on Industry 4.0 and Smart Manufacturing, pp. 620–628. ISSN : 1877-0509

  21. [21]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization”. In:Proceedings of the IEEE international conference on computer vision. 2017, pp. 618–626

  22. [22]

    pl/EN/node/46

    Warsaw University of Technology.Warsaw Datasets Webpage.http://zbum.ia.pw.edu. pl/EN/node/46. 2013. 11

  23. [23]

    Assessment of iris recognition reliability for eyes affected by ocular pathologies

    Mateusz Trokielewicz, Adam Czajka, and Piotr Maciejewicz. “Assessment of iris recognition reliability for eyes affected by ocular pathologies”. In:2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE. 2015, pp. 1–6

  24. [24]

    Post-mortem iris recognition with deep-learning-based image segmentation

    Mateusz Trokielewicz, Adam Czajka, and Piotr Maciejewicz. “Post-mortem iris recognition with deep-learning-based image segmentation”. In: Image and Vision Computing 94 (2020), p. 103866

  25. [25]

    Score-CAM: Score-weighted visual explanations for convolutional neural networks

    Haofan Wang et al. “Score-CAM: Score-weighted visual explanations for convolutional neural networks”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020, pp. 24–25

  26. [26]

    Synthesis of large realistic iris databases using patch-based sampling

    Zhuoshi Wei, Tieniu Tan, and Zhenan Sun. “Synthesis of large realistic iris databases using patch-based sampling”. In: 2008 19th International Conference on Pattern Recognition. IEEE. 2008, pp. 1–4

  27. [27]

    LivDet Iris 2017-Iris Liveness Detection Competition 2017

    David Yambay et al. “LivDet Iris 2017-Iris Liveness Detection Competition 2017”. In: ()

  28. [28]

    LivDet-Iris 2015–Iris Liveness Detection Competition 2015

    David Yambay et al. “LivDet-Iris 2015–Iris Liveness Detection Competition 2015”. In: ()

  29. [29]

    Learning deep features for discriminative localization

    Bolei Zhou et al. “Learning Deep Features for Discriminative Localization”. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, pp. 2921–2929. DOI: 10.1109/CVPR.2016.319. 12