DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks
Pith reviewed 2026-05-19 11:04 UTC · model grok-4.3
The pith
DiffGradCAM generates activation maps from logit differences to resist passive fooling while matching standard GradCAM outputs on normal data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that computing class activation maps using gradients with respect to the difference between the target class logit and other logits produces explanations that are immune to passive fooling attacks. These attacks train the model to output misleading activation maps without degrading classification accuracy. In the absence of such attacks, the new maps coincide exactly with those from GradCAM and GradCAM++. The method is validated by introducing SHAMs, an entropy-aware passive fooling technique, and testing on multi-class problems.
What carries the argument
DiffGradCAM, which derives activation maps from the gradient of the logit difference between the predicted class and alternatives rather than from the single target logit.
If this is right
- The maps remain unchanged under passive fooling that affects standard methods.
- Higher-order version DiffGradCAM++ inherits the same robustness.
- SHAMs provide a stricter test for saliency map robustness than prior fooling methods.
- The technique applies to both few-class and many-class classification tasks.
Where Pith is reading between the lines
- This contrastive view might generalize to other gradient-based explainers in machine learning.
- Models trained with this in mind could produce inherently more trustworthy explanations.
- Future work could explore whether similar difference-based adjustments help against active adversarial attacks on explanations.
Load-bearing premise
That basing the map on logit differences captures the full decision without missing key information or creating new vulnerabilities when the input distribution changes.
What would settle it
An experiment showing that DiffGradCAM produces misleading maps under a passive fooling attack specifically designed to target logit differences, or that it deviates from GradCAM outputs on clean data.
Figures
read the original abstract
Class Activation Mapping (CAM) and its gradient-based variants (e.g., GradCAM) have become standard tools for explaining Convolutional Neural Network (CNN) predictions. However, these approaches typically focus on individual logits, while for neural networks using softmax, the class membership probability estimates depend only on the differences between logits, not on their absolute values. This disconnect leaves standard CAMs vulnerable to adversarial manipulation, such as passive fooling, where a model is trained to produce misleading CAMs without affecting decision performance. To address this vulnerability, we propose DiffGradCAM and its higher-order derivative version DiffGradCAM++, as novel, lightweight, contrastive approaches to class activation mapping that are not susceptible to passive fooling and match the output of standard methods such as GradCAM and GradCAM++ in the non-adversarial case. To test our claims, we introduce Salience-Hoax Activation Maps (SHAMs), a more advanced, entropy-aware form of passive fooling that serves as a benchmark for CAM robustness under adversarial conditions. Together, SHAM and DiffGradCAM establish a new framework for probing and improving the robustness of saliency-based explanations. We validate both contributions across multi-class tasks with few and many classes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DiffGradCAM and DiffGradCAM++ as contrastive class activation mapping methods that compute gradients from logit differences (rather than single-class logits) to eliminate vulnerability to passive fooling attacks while reproducing the outputs of GradCAM/GradCAM++ on clean data. It introduces the SHAM benchmark (an entropy-aware passive-fooling procedure) to evaluate robustness and reports validation on multi-class tasks with varying numbers of classes.
Significance. If the central claim holds, the work would supply a lightweight, parameter-free modification that closes a documented gap between CAM explanations and the softmax decision rule, together with a new benchmark for testing explanation robustness. The absence of any quantitative tables, error bars, or dataset statistics in the provided text, however, prevents assessment of whether the method actually achieves the claimed invariance or merely shifts the attack surface.
major comments (2)
- [Abstract, §3] Abstract and §3: the assertion that DiffGradCAM is 'not susceptible to passive fooling' by construction rests on the observation that decisions depend only on logit differences. No formal argument or counter-example analysis is supplied showing that an adversary cannot optimize a loss directly against the difference-based gradient field while keeping classification loss near zero; the SHAM benchmark is described only for standard GradCAM.
- [Abstract] Abstract: the claim that DiffGradCAM 'matches the output of standard methods such as GradCAM and GradCAM++ in the non-adversarial case' is stated without any quantitative metric, dataset size, or statistical test. This leaves the equivalence claim unverifiable from the text.
minor comments (2)
- [§3] Notation for the difference operator (e.g., y_c − y_k) should be defined explicitly before its first use in the method section.
- [§4] The description of SHAM should include the precise entropy term and the optimization objective so that the benchmark can be reproduced.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3: the assertion that DiffGradCAM is 'not susceptible to passive fooling' by construction rests on the observation that decisions depend only on logit differences. No formal argument or counter-example analysis is supplied showing that an adversary cannot optimize a loss directly against the difference-based gradient field while keeping classification loss near zero; the SHAM benchmark is described only for standard GradCAM.
Authors: We appreciate the referee's point that a purely construction-based claim requires additional support. The central design choice in DiffGradCAM is to replace the single-logit gradient with the gradient of a logit difference (target logit minus the strongest competing logit), which directly mirrors the quantity that determines the softmax decision. Because any passive-fooling objective that alters this difference gradient must, by definition, change the relative logit values that govern classification, we believe the attack surface is closed; however, we concede that an explicit proof or attempted counter-example was not supplied. In the revised manuscript we will add a short formal argument in §3 showing that the classification loss and the difference-based explanation loss share the same critical points with respect to logit perturbations. We will also extend the SHAM benchmark description and evaluation protocol to DiffGradCAM/DiffGradCAM++ and report the resulting robustness metrics. revision: yes
-
Referee: [Abstract] Abstract: the claim that DiffGradCAM 'matches the output of standard methods such as GradCAM and GradCAM++ in the non-adversarial case' is stated without any quantitative metric, dataset size, or statistical test. This leaves the equivalence claim unverifiable from the text.
Authors: We agree that the equivalence statement would be more convincing with quantitative backing. Although the manuscript already contains qualitative side-by-side visualizations, we will revise the abstract and add a concise quantitative comparison subsection. This will report average Pearson correlation and SSIM values between DiffGradCAM and GradCAM maps computed on clean validation images, together with the exact dataset sizes, number of classes, and basic statistical tests. Error bars from repeated runs will also be included to address the broader concern about missing quantitative tables and statistics. revision: yes
Circularity Check
Core DiffGradCAM construction follows directly from softmax logit-difference property; no load-bearing self-citation or fitted prediction
full rationale
The paper's central move—replacing single-logit gradients with differences y_c - y_k—rests on the standard mathematical fact that softmax probabilities are invariant to additive constants and depend only on logit differences. This is an external property of the softmax function, not a result derived or fitted inside the paper. No equations reduce the claimed immunity to a self-referential definition, no parameters are fitted on a subset and then relabeled as predictions, and the SHAM benchmark is introduced as an independent test rather than a tautological re-use of the same data or prior self-citation. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1. In the binary classification setting with logits z1 and z2, softmax reduces to the sigmoid function applied to their difference.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DiffGradCAM replaces the single-logit target with a contrastive logit differential: Δ = ztrue − β(zfalse)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks
CAMAL adds an auxiliary regularizer during training that aligns model attention with segmentation masks to improve both spatial accuracy and causal faithfulness of attention in deep learning and deep reinforcement lea...
Reference graph
Works this paper leans on
- [1]
-
[2]
Human-aided saliency maps improve generalization of deep learning
Aidan Boyd, Kevin W Bowyer, and Adam Czajka. “Human-aided saliency maps improve generalization of deep learning”. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022, pp. 2735–2744
work page 2022
-
[3]
Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks
Aditya Chattopadhay et al. “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks”. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE. 2018, pp. 839–847
work page 2018
-
[4]
Chinese academy of sciences institute of automation . Accessed: 03-12-2021. URL: http: //www.cbsr.ia.ac.cn/china/Iris%20Databases%20CH.asp
work page 2021
-
[5]
Explanations can be manipulated and geometry is to blame
Ann-Kathrin Dombrowski et al. “Explanations can be manipulated and geometry is to blame”. In: Advances in Neural Information Processing Systems . Ed. by H. Wallach et al. V ol. 32. Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper_files/ paper/2019/file/bb836c01cdc9120a9c984c525e4b1a4a-Paper.pdf
work page 2019
-
[6]
Use hirescam ins tead of grad-cam for faithful explanations of convolu- tional neural networks
Rachel Lea Draelos and Lawrence Carin. “Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks”. In:arXiv preprint arXiv:2011.08891 (2020)
-
[7]
Axiom-based grad-cam: Towards accurate visualization and explanation of cnns
Ruigang Fu et al. “Axiom-based grad-cam: Towards accurate visualization and explanation of cnns”. In: arXiv preprint arXiv:2008.02312 (2020)
-
[8]
Iris liveness detection based on quality related features
Javier Galbally et al. “Iris liveness detection based on quality related features”. In:2012 5th IAPR International Conference on Biometrics (ICB). IEEE. 2012, pp. 271–276
work page 2012
-
[9]
Fooling neural network interpretations via adversarial model manipulation
Juyeon Heo, Sunghwan Joo, and Taesup Moon. “Fooling neural network interpretations via adversarial model manipulation”. In: Advances in neural information processing systems 32 (2019)
work page 2019
-
[10]
Sur les fonctions convexes et les inégalités entre les valeurs moyennes
J. L. W. V . Jensen. “Sur les fonctions convexes et les inégalités entre les valeurs moyennes”. In: Acta Mathematica 30 (1906), pp. 175–193
work page 1906
-
[11]
Detecting medley of iris spoofing attacks using DESIST
Naman Kohli et al. “Detecting medley of iris spoofing attacks using DESIST”. In:2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE. 2016, pp. 1–6
work page 2016
-
[12]
Revisiting iris recognition with color cosmetic contact lenses
Naman Kohli et al. “Revisiting iris recognition with color cosmetic contact lenses”. In:2013 International Conference on Biometrics (ICB). IEEE. 2013, pp. 1–7
work page 2013
-
[13]
M. R. Leadbetter, G. Lindgren, and H. Rootzén. Extremes and Related Properties of Random Sequences and Processes. Springer Series in Statistics. Reprint of the 1983 edition. Springer, 2012
work page 1983
-
[14]
Multifeature-based fake iris detection method
Sung Joo Lee et al. “Multifeature-based fake iris detection method”. In: Optical Engineering 46.12 (2007), pp. 127204–127204
work page 2007
-
[15]
Eigen-cam: Class activation map using principal components
Mohammed Bany Muhammad and Mohammed Yeasin. “Eigen-cam: Class activation map using principal components”. In: 2020 international joint conference on neural networks (IJCNN). IEEE. 2020, pp. 1–7
work page 2020
-
[16]
Model Focus Improves Performance of Deep Learning-Based Synthetic Face Detectors
Jacob Piland, Adam Czajka, and Christopher Sweet. “Model Focus Improves Performance of Deep Learning-Based Synthetic Face Detectors”. In: IEEE Access (2023)
work page 2023
-
[17]
Ablation-cam: Visual explanations for deep convo- lutional network via gradient-free localization
Harish Guruprasad Ramaswamy et al. “Ablation-cam: Visual explanations for deep convo- lutional network via gradient-free localization”. In: proceedings of the IEEE/CVF winter conference on applications of computer vision. 2020, pp. 983–991
work page 2020
-
[18]
Eye movement-driven defense against iris print- attacks
Ioannis Rigas and Oleg V Komogortsev. “Eye movement-driven defense against iris print- attacks”. In: Pattern Recognition Letters 68 (2015), pp. 316–326
work page 2015
-
[19]
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky et al. “ImageNet Large Scale Visual Recognition Challenge”. In: CoRR abs/1409.0575 (2014). arXiv: 1409.0575. URL: http://arxiv.org/abs/1409.0575
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[20]
Improving the Interpretability of GradCAMs in Deep Classification Networks
Alfred Schöttl. “Improving the Interpretability of GradCAMs in Deep Classification Networks”. In: Procedia Computer Science 200 (2022). 3rd International Conference on Industry 4.0 and Smart Manufacturing, pp. 620–628. ISSN : 1877-0509
work page 2022
-
[21]
Grad-cam: Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization”. In:Proceedings of the IEEE international conference on computer vision. 2017, pp. 618–626
work page 2017
-
[22]
Warsaw University of Technology.Warsaw Datasets Webpage.http://zbum.ia.pw.edu. pl/EN/node/46. 2013. 11
work page 2013
-
[23]
Assessment of iris recognition reliability for eyes affected by ocular pathologies
Mateusz Trokielewicz, Adam Czajka, and Piotr Maciejewicz. “Assessment of iris recognition reliability for eyes affected by ocular pathologies”. In:2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE. 2015, pp. 1–6
work page 2015
-
[24]
Post-mortem iris recognition with deep-learning-based image segmentation
Mateusz Trokielewicz, Adam Czajka, and Piotr Maciejewicz. “Post-mortem iris recognition with deep-learning-based image segmentation”. In: Image and Vision Computing 94 (2020), p. 103866
work page 2020
-
[25]
Score-CAM: Score-weighted visual explanations for convolutional neural networks
Haofan Wang et al. “Score-CAM: Score-weighted visual explanations for convolutional neural networks”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020, pp. 24–25
work page 2020
-
[26]
Synthesis of large realistic iris databases using patch-based sampling
Zhuoshi Wei, Tieniu Tan, and Zhenan Sun. “Synthesis of large realistic iris databases using patch-based sampling”. In: 2008 19th International Conference on Pattern Recognition. IEEE. 2008, pp. 1–4
work page 2008
-
[27]
LivDet Iris 2017-Iris Liveness Detection Competition 2017
David Yambay et al. “LivDet Iris 2017-Iris Liveness Detection Competition 2017”. In: ()
work page 2017
-
[28]
LivDet-Iris 2015–Iris Liveness Detection Competition 2015
David Yambay et al. “LivDet-Iris 2015–Iris Liveness Detection Competition 2015”. In: ()
work page 2015
-
[29]
Learning deep features for discriminative localization
Bolei Zhou et al. “Learning Deep Features for Discriminative Localization”. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, pp. 2921–2929. DOI: 10.1109/CVPR.2016.319. 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.