pith. machine review for the scientific record. sign in

arxiv: 2604.25315 · v1 · submitted 2026-04-28 · 💻 cs.CV

Recognition: unknown

SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords saliency mapsfeature decorrelationneural network interpretabilitygradient-based explanationsattribution methodsrepresentation geometrytraining regularizationcomputer vision
0
0 comments X

The pith

Enforcing feature decorrelation during training sharpens gradient-based saliency maps and improves model accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that correlated feature dimensions in neural network representations cause attribution gradients to spread diffusely, producing noisy and semantically misaligned saliency maps. It proposes SaliencyDecor as a training procedure that adds a decorrelation regularizer to the usual classification loss and a masking consistency term. This joint optimization reshapes the internal feature space toward orthogonality without any change to the network architecture or to existing saliency algorithms. The resulting models yield sharper, object-focused explanations while also posting accuracy gains on standard vision benchmarks. Readers should care because the work directly challenges the assumption that better interpretability must come at the expense of predictive performance.

Core claim

By jointly optimizing classification accuracy, prediction consistency under feature masking, and a decorrelation regularizer that pushes learned features toward orthogonality, the method concentrates gradient flow so that standard saliency techniques produce substantially sharper and more object-focused maps, all while delivering measurable accuracy improvements across multiple datasets and architectures and without introducing inference-time overhead or architectural modifications.

What carries the argument

The decorrelation regularizer, added to a joint training objective that also includes classification and masking consistency losses, which reshapes the feature space toward orthogonality to concentrate attribution gradients.

If this is right

  • Gradient-based saliency methods become more faithful without any modification to the saliency algorithm itself.
  • Interpretability gains occur together with, rather than in opposition to, predictive performance gains.
  • The same trained model can be used for both higher-accuracy prediction and higher-quality explanations at no extra cost.
  • The improvement holds across multiple standard vision datasets and common network architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decorrelation principle could be tested on non-gradient attribution methods to check whether representation geometry affects explanation quality more broadly.
  • Applying the regularizer only during fine-tuning rather than from scratch might preserve pre-trained features while still sharpening saliency on downstream tasks.
  • If decorrelated features reduce redundancy, the approach may also improve robustness to adversarial perturbations that exploit correlated directions.
  • One could measure whether the degree of achieved orthogonality correlates directly with saliency sharpness on held-out data as a simple diagnostic.

Load-bearing premise

That correlated feature dimensions are the dominant cause of diffuse gradients and that adding the decorrelation term will reliably focus saliency without creating new biases or failure modes.

What would settle it

Training a model with the full SaliencyDecor objective on a standard benchmark and then finding that its saliency maps remain as noisy and background-focused as the baseline while accuracy stays flat or declines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.25315 by Ali Karkehabadi, Avesta Sasan, Houman Homayoun, Jamshid Hassanpour.

Figure 1
Figure 1. Figure 1: Overview of the proposed SaliencyDecor framework. Given an input X, gradient-based importance scores are used to identify non-informative regions and generate a saliency mask M. Intermediate encoder features are decorrelated via group-wise ZCA whitening to reduce redundancy and stabilize gradient attribution. The network is trained with a multi-objective loss combining classification, consistency, and deco… view at source ↗
Figure 2
Figure 2. Figure 2: Gradient visualization and distribution analysis for MNIST. Left: Original digit images. Middle columns: Top 10% gradients and full gradient maps for view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy degradation under progressive feature masking on MNIST: view at source ↗
Figure 4
Figure 4. Figure 4: Gradient visualization analysis for complex datasets. Shows original view at source ↗
read the original abstract

Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of this behavior lies in the geometry of learned representations: correlated feature dimensions diffuse attribution gradients across redundant directions, resulting in blurred and unreliable saliency maps. To address this issue, we identify feature correlation as a structural limitation of gradient-based interpretability and propose SaliencyDecor, a training framework that enforces feature decorrelation to improve attribution fidelity without modifying saliency methods or model architectures by reshaping the feature space toward orthogonality, our approach promotes more concentrated gradient flow and improves the fidelity of saliency-based explanations. SaliencyDecor jointly optimizes classification, prediction consistency under feature masking, and a decorrelation regularizer, requiring no architectural changes or inference-time overhead. Extensive experiments across multiple benchmarks and architectures demonstrate that our method produces substantially sharper and more object-focused saliency maps while simultaneously improving predictive performance, achieving accuracy gains across the datasets. These results establish our method as a principled mechanism for enhancing both interpretability and accuracy, challenging the conventional trade-off between explanation quality and model performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript argues that correlated feature dimensions in neural networks diffuse attribution gradients, producing noisy saliency maps. It proposes SaliencyDecor, a training framework that augments the classification objective with a prediction-consistency term under feature masking and a decorrelation regularizer to enforce orthogonal representations. The method requires no architectural changes or inference overhead and is claimed to yield substantially sharper, more object-focused gradient-based saliency maps while also improving predictive accuracy across multiple benchmarks and architectures.

Significance. If the central claims are substantiated, the work would be significant for interpretability research: it offers a training-time intervention that simultaneously targets explanation fidelity and task performance by reshaping the geometry of the learned feature space, without the usual cost of post-hoc methods or architectural redesign. The absence of inference-time overhead and the joint optimization of accuracy and consistency are practical strengths.

major comments (2)
  1. [Experiments] The experimental section provides no ablation that removes only the decorrelation regularizer while retaining the masking-consistency term. Because the reported gains in saliency sharpness and accuracy are obtained under the joint loss, it is impossible to attribute the improvements specifically to feature decorrelation rather than to the additional regularization or optimization dynamics introduced by the masking objective.
  2. [Abstract and Experiments] The abstract and results claim 'substantially sharper and more object-focused saliency maps' together with 'accuracy gains across the datasets,' yet supply no quantitative metrics, baseline comparisons, ablation tables, or error bars. Without these details the central empirical assertion cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. The comments highlight important aspects of experimental rigor that we will address in the revision. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Experiments] The experimental section provides no ablation that removes only the decorrelation regularizer while retaining the masking-consistency term. Because the reported gains in saliency sharpness and accuracy are obtained under the joint loss, it is impossible to attribute the improvements specifically to feature decorrelation rather than to the additional regularization or optimization dynamics introduced by the masking objective.

    Authors: We agree that the current experiments do not isolate the contribution of the decorrelation regularizer. To directly address this, we will add a new ablation in the revised manuscript: models trained using only the prediction-consistency term under feature masking (without the decorrelation loss) will be compared against both the standard baseline and the full SaliencyDecor objective. This will allow us to quantify how much of the observed improvement in saliency sharpness and accuracy is attributable to enforcing orthogonality in the feature space versus the masking-based consistency term alone. revision: yes

  2. Referee: [Abstract and Experiments] The abstract and results claim 'substantially sharper and more object-focused saliency maps' together with 'accuracy gains across the datasets,' yet supply no quantitative metrics, baseline comparisons, ablation tables, or error bars. Without these details the central empirical assertion cannot be evaluated.

    Authors: We acknowledge that the abstract and experimental presentation would be strengthened by explicit quantitative support. While accuracy improvements are reported as averages over multiple random seeds in the full manuscript, we will revise the abstract to avoid overstatement and add a dedicated results table that includes: (i) quantitative saliency sharpness metrics (such as average entropy of the saliency maps and, where object annotations are available, overlap with ground-truth regions), (ii) direct comparisons against standard baselines, (iii) the requested ablation table, and (iv) error bars or standard deviations for all metrics. These additions will make the empirical claims fully evaluable. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit regularizers and joint loss are independent of claimed outputs

full rationale

The paper proposes SaliencyDecor as a new training objective that jointly optimizes classification loss, a masking consistency term, and an explicit decorrelation regularizer. This construction is presented directly as the method rather than as a derivation that reduces any prediction or saliency improvement to a quantity already fitted inside the same equations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The claimed sharper saliency maps and accuracy gains are positioned as empirical outcomes of the added terms, which remain externally testable via ablation or reproduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven premise that feature correlation is the root cause of saliency noise and that the added regularizer will concentrate gradients without side effects; the weight of the decorrelation term is a free hyperparameter whose value is not derived.

free parameters (1)
  • decorrelation regularizer weight
    Hyperparameter that balances the decorrelation loss against classification and consistency losses; its value must be chosen or tuned.
axioms (1)
  • domain assumption Correlated feature dimensions are the fundamental cause of diffused and unreliable gradient attributions in saliency maps
    Invoked in the opening argument as the structural limitation being addressed.

pith-pipeline@v0.9.0 · 5514 in / 1151 out tokens · 45691 ms · 2026-05-07T16:46:38.974746+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems, pp. 1097–1105, 2012

  2. [2]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018

  3. [3]

    The mythos of model interpretability,

    Z. C. Lipton, “The mythos of model interpretability,”Queue, vol. 16, no. 3, pp. 31–57, 2018

  4. [4]

    Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,

    R. Caruana, Y . Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” inProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730, 2015

  5. [5]

    How to explain individual classification decisions,

    D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M ¨uller, “How to explain individual classification decisions,” Journal of Machine Learning Research, vol. 11, pp. 1803–1831, 2010

  6. [6]

    Axiomatic attribution for deep networks,

    M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inInternational Conference on Machine Learning, pp. 3319– 3328, PMLR, 2017

  7. [7]

    Sanity checks for saliency maps,

    J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” inAdvances in Neural Information Processing Systems, pp. 9505–9515, 2018

  8. [8]

    Decorrelated batch normal- ization,

    L. Huang, D. Yang, B. Lang, and J. Deng, “Decorrelated batch normal- ization,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 791–800, 2018

  9. [9]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift,

    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” inInternational conference on machine learning, pp. 448–456, PMLR, 2015

  10. [10]

    Smooth- grad: removing noise by adding noise,

    D. Smilkov, N. Thorat, B. Kim, F. Vi ´egas, and M. Wattenberg, “Smooth- grad: removing noise by adding noise,” inWorkshop on Visualization for Deep Learning, ICML, 2017

  11. [11]

    Improving deep learning inter- pretability by saliency guided training,

    A. A. Ismail, S. Feizi, and H. C. Bravo, “Improving deep learning inter- pretability by saliency guided training,”Advances in Neural Information Processing Systems, vol. 34, pp. 13890–13903, 2021

  12. [12]

    Guided inte- grated gradients: An adaptive path method for removing noise,

    A. Kapishnikov, T. Bolukbasi, F. Vi ´egas, and M. Terry, “Guided inte- grated gradients: An adaptive path method for removing noise,”arXiv preprint arXiv:2106.09650, 2021

  13. [13]

    Smoot: Saliency guided mask optimized online training,

    A. Karkehabadi, H. Homayoun, and A. Sasan, “Smoot: Saliency guided mask optimized online training,” in2024 IEEE 17th Dallas circuits and systems conference (DCAS), pp. 1–6, IEEE, 2024

  14. [14]

    Hlgm: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking,

    A. Karkehabadi, B. S. Latibari, H. Homayoun, and A. Sasan, “Hlgm: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking,” in2024 14th International Conference on Information Science and Technology (ICIST), pp. 909–917, IEEE, 2024

  15. [15]

    Learning important features through propagating activation differences,

    A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” inInternational Conference on Machine Learning, pp. 3145–3153, PMLR, 2017

  16. [16]

    A unified approach to interpreting model predictions,

    S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inAdvances in Neural Information Processing Systems, pp. 4765–4774, 2017

  17. [17]

    Interpretation of neural networks is fragile,

    A. Ghorbani, A. Abid, and J. Zou, “Interpretation of neural networks is fragile,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3681–3688, 2019

  18. [18]

    Re- ducing overfitting in deep networks by decorrelating representations,

    M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra, “Re- ducing overfitting in deep networks by decorrelating representations,” inInternational Conference on Learning Representations, 2016

  19. [19]

    On feature decorrelation in self-supervised learning,

    T. Hua, W. Wang, Z. Xue, S. Ren, Y . Wang, and H. Zhao, “On feature decorrelation in self-supervised learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9598–9608, 2021

  20. [20]

    Barlow twins: Self-supervised learning via redundancy reduction,

    J. Zbontar, L. Jing, I. Misra, Y . LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” inInternational Conference on Machine Learning, pp. 12310–12320, PMLR, 2021

  21. [21]

    Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation,

    J. Hassanpour, V . Srivastav, D. Mutter, and N. Padoy, “Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation,” pp. 1–5, 05 2024

  22. [22]

    Right for the right rea- sons: Training differentiable models by constraining their explanations,

    A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right rea- sons: Training differentiable models by constraining their explanations,” inProceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2662–2670, 2017

  23. [23]

    Saliency learning: Teaching the model where to pay attention,

    R. Ghaeini, X. Z. Fern, and P. Tadepalli, “Saliency learning: Teaching the model where to pay attention,”arXiv preprint arXiv:1902.08649, 2019

  24. [24]

    Learning deep features for discriminative localization,

    B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929, 2016

  25. [25]

    Distilling a Neural Network Into a Soft Decision Tree

    N. Frosst and G. Hinton, “Distilling a neural network into a soft decision tree,”arXiv preprint arXiv:1711.09784, 2017

  26. [26]

    A connection between adversarial robustness and saliency map interpretability,

    C. Etmann, S. Lunz, P. Maass, and C. Sch ¨onlieb, “A connection between adversarial robustness and saliency map interpretability,” inInternational Conference on Machine Learning, pp. 1823–1832, PMLR, 2019

  27. [27]

    Gradient-based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

  28. [28]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hinton,et al., “Learning multiple layers of features from tiny images,” 2009

  29. [29]

    Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,

    L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” in2004 conference on computer vision and pattern recognition workshop, pp. 178–178, IEEE, 2004

  30. [30]

    Ima- genet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Ima- genet: A large-scale hierarchical image database,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009

  31. [31]

    Gradient regulariza- tion improves accuracy of discriminative models,

    D. Varga, A. Csisz ´arik, and Z. Zombori, “Gradient regulariza- tion improves accuracy of discriminative models,”arXiv preprint arXiv:1712.09936, 2017

  32. [32]

    Integrated gradient correlation: a dataset- wise attribution method,

    P. Leli `evre and C.-C. Chen, “Integrated gradient correlation: a dataset- wise attribution method,”arXiv preprint arXiv:2404.13910, 2024

  33. [33]

    Scaat: Improving neural network interpretability via saliency constrained adaptive adversarial training,

    R. Xu, W. Qin, P. Huang, H. Wang, and L. Luo, “Scaat: Improving neural network interpretability via saliency constrained adaptive adversarial training,”arXiv preprint arXiv:2311.05143, 2023