arxiv: 2604.25315 · v1 · submitted 2026-04-28 · 💻 cs.CV

Recognition: unknown

SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation

Ali Karkehabadi , Jamshid Hassanpour , Houman Homayoun , Avesta Sasan

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords saliency mapsfeature decorrelationneural network interpretabilitygradient-based explanationsattribution methodsrepresentation geometrytraining regularizationcomputer vision

0 comments

The pith

Enforcing feature decorrelation during training sharpens gradient-based saliency maps and improves model accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that correlated feature dimensions in neural network representations cause attribution gradients to spread diffusely, producing noisy and semantically misaligned saliency maps. It proposes SaliencyDecor as a training procedure that adds a decorrelation regularizer to the usual classification loss and a masking consistency term. This joint optimization reshapes the internal feature space toward orthogonality without any change to the network architecture or to existing saliency algorithms. The resulting models yield sharper, object-focused explanations while also posting accuracy gains on standard vision benchmarks. Readers should care because the work directly challenges the assumption that better interpretability must come at the expense of predictive performance.

Core claim

By jointly optimizing classification accuracy, prediction consistency under feature masking, and a decorrelation regularizer that pushes learned features toward orthogonality, the method concentrates gradient flow so that standard saliency techniques produce substantially sharper and more object-focused maps, all while delivering measurable accuracy improvements across multiple datasets and architectures and without introducing inference-time overhead or architectural modifications.

What carries the argument

The decorrelation regularizer, added to a joint training objective that also includes classification and masking consistency losses, which reshapes the feature space toward orthogonality to concentrate attribution gradients.

If this is right

Gradient-based saliency methods become more faithful without any modification to the saliency algorithm itself.
Interpretability gains occur together with, rather than in opposition to, predictive performance gains.
The same trained model can be used for both higher-accuracy prediction and higher-quality explanations at no extra cost.
The improvement holds across multiple standard vision datasets and common network architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decorrelation principle could be tested on non-gradient attribution methods to check whether representation geometry affects explanation quality more broadly.
Applying the regularizer only during fine-tuning rather than from scratch might preserve pre-trained features while still sharpening saliency on downstream tasks.
If decorrelated features reduce redundancy, the approach may also improve robustness to adversarial perturbations that exploit correlated directions.
One could measure whether the degree of achieved orthogonality correlates directly with saliency sharpness on held-out data as a simple diagnostic.

Load-bearing premise

That correlated feature dimensions are the dominant cause of diffuse gradients and that adding the decorrelation term will reliably focus saliency without creating new biases or failure modes.

What would settle it

Training a model with the full SaliencyDecor objective on a standard benchmark and then finding that its saliency maps remain as noisy and background-focused as the baseline while accuracy stays flat or declines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.25315 by Ali Karkehabadi, Avesta Sasan, Houman Homayoun, Jamshid Hassanpour.

**Figure 1.** Figure 1: Overview of the proposed SaliencyDecor framework. Given an input X, gradient-based importance scores are used to identify non-informative regions and generate a saliency mask M. Intermediate encoder features are decorrelated via group-wise ZCA whitening to reduce redundancy and stabilize gradient attribution. The network is trained with a multi-objective loss combining classification, consistency, and deco… view at source ↗

**Figure 2.** Figure 2: Gradient visualization and distribution analysis for MNIST. Left: Original digit images. Middle columns: Top 10% gradients and full gradient maps for view at source ↗

**Figure 3.** Figure 3: Accuracy degradation under progressive feature masking on MNIST: view at source ↗

**Figure 4.** Figure 4: Gradient visualization analysis for complex datasets. Shows original view at source ↗

read the original abstract

Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of this behavior lies in the geometry of learned representations: correlated feature dimensions diffuse attribution gradients across redundant directions, resulting in blurred and unreliable saliency maps. To address this issue, we identify feature correlation as a structural limitation of gradient-based interpretability and propose SaliencyDecor, a training framework that enforces feature decorrelation to improve attribution fidelity without modifying saliency methods or model architectures by reshaping the feature space toward orthogonality, our approach promotes more concentrated gradient flow and improves the fidelity of saliency-based explanations. SaliencyDecor jointly optimizes classification, prediction consistency under feature masking, and a decorrelation regularizer, requiring no architectural changes or inference-time overhead. Extensive experiments across multiple benchmarks and architectures demonstrate that our method produces substantially sharper and more object-focused saliency maps while simultaneously improving predictive performance, achieving accuracy gains across the datasets. These results establish our method as a principled mechanism for enhancing both interpretability and accuracy, challenging the conventional trade-off between explanation quality and model performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The decorrelation regularizer plus masking consistency is a practical training tweak, but without an ablation isolating the decorrelation term the claimed mechanism stays untested.

read the letter

The paper's main move is to add a decorrelation regularizer to the training loss alongside a term that enforces prediction consistency when features are masked. The goal is to reduce correlation in the learned features so that gradient-based saliency maps become sharper and more object-focused, while also lifting accuracy. No architecture changes or inference cost are required, which is the practical selling point. The geometric story that correlated dimensions spread gradients across redundant directions is simple and plausible, and the joint objective is a concrete way to target saliency fidelity directly during training rather than fixing explanations afterward. That combination is the actual new element here. The approach does well by staying lightweight and by reporting gains on both interpretability and performance, which avoids the usual trade-off people expect. If the numbers hold up across the claimed benchmarks and architectures, it could be a useful recipe for anyone training vision models where explanations matter. The soft spot is the missing isolation of the decorrelation term. The total loss has three pieces, and the masking consistency objective by itself pushes the model to rely on fewer decisive features, which can clean up saliency maps without any orthogonality penalty. Because no ablation keeps the masking term and drops only the decorrelation regularizer, the improvements cannot yet be attributed to the geometric fix the authors emphasize. The abstract also gives no quantitative results, baselines, or error bars, so the size of any gain is still unclear. This leaves the central causal claim resting on an unseparated effect. For readers working on training-time fixes for interpretability in computer vision, the paper is worth a look once the full experiments and controls are checked. It deserves peer review because the problem is real and the method is easy to try, even though revisions would need to address the ablation gap and supply the actual data.

Referee Report

2 major / 0 minor

Summary. The manuscript argues that correlated feature dimensions in neural networks diffuse attribution gradients, producing noisy saliency maps. It proposes SaliencyDecor, a training framework that augments the classification objective with a prediction-consistency term under feature masking and a decorrelation regularizer to enforce orthogonal representations. The method requires no architectural changes or inference overhead and is claimed to yield substantially sharper, more object-focused gradient-based saliency maps while also improving predictive accuracy across multiple benchmarks and architectures.

Significance. If the central claims are substantiated, the work would be significant for interpretability research: it offers a training-time intervention that simultaneously targets explanation fidelity and task performance by reshaping the geometry of the learned feature space, without the usual cost of post-hoc methods or architectural redesign. The absence of inference-time overhead and the joint optimization of accuracy and consistency are practical strengths.

major comments (2)

[Experiments] The experimental section provides no ablation that removes only the decorrelation regularizer while retaining the masking-consistency term. Because the reported gains in saliency sharpness and accuracy are obtained under the joint loss, it is impossible to attribute the improvements specifically to feature decorrelation rather than to the additional regularization or optimization dynamics introduced by the masking objective.
[Abstract and Experiments] The abstract and results claim 'substantially sharper and more object-focused saliency maps' together with 'accuracy gains across the datasets,' yet supply no quantitative metrics, baseline comparisons, ablation tables, or error bars. Without these details the central empirical assertion cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. The comments highlight important aspects of experimental rigor that we will address in the revision. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Experiments] The experimental section provides no ablation that removes only the decorrelation regularizer while retaining the masking-consistency term. Because the reported gains in saliency sharpness and accuracy are obtained under the joint loss, it is impossible to attribute the improvements specifically to feature decorrelation rather than to the additional regularization or optimization dynamics introduced by the masking objective.

Authors: We agree that the current experiments do not isolate the contribution of the decorrelation regularizer. To directly address this, we will add a new ablation in the revised manuscript: models trained using only the prediction-consistency term under feature masking (without the decorrelation loss) will be compared against both the standard baseline and the full SaliencyDecor objective. This will allow us to quantify how much of the observed improvement in saliency sharpness and accuracy is attributable to enforcing orthogonality in the feature space versus the masking-based consistency term alone. revision: yes
Referee: [Abstract and Experiments] The abstract and results claim 'substantially sharper and more object-focused saliency maps' together with 'accuracy gains across the datasets,' yet supply no quantitative metrics, baseline comparisons, ablation tables, or error bars. Without these details the central empirical assertion cannot be evaluated.

Authors: We acknowledge that the abstract and experimental presentation would be strengthened by explicit quantitative support. While accuracy improvements are reported as averages over multiple random seeds in the full manuscript, we will revise the abstract to avoid overstatement and add a dedicated results table that includes: (i) quantitative saliency sharpness metrics (such as average entropy of the saliency maps and, where object annotations are available, overlap with ground-truth regions), (ii) direct comparisons against standard baselines, (iii) the requested ablation table, and (iv) error bars or standard deviations for all metrics. These additions will make the empirical claims fully evaluable. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit regularizers and joint loss are independent of claimed outputs

full rationale

The paper proposes SaliencyDecor as a new training objective that jointly optimizes classification loss, a masking consistency term, and an explicit decorrelation regularizer. This construction is presented directly as the method rather than as a derivation that reduces any prediction or saliency improvement to a quantity already fitted inside the same equations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The claimed sharper saliency maps and accuracy gains are positioned as empirical outcomes of the added terms, which remain externally testable via ablation or reproduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven premise that feature correlation is the root cause of saliency noise and that the added regularizer will concentrate gradients without side effects; the weight of the decorrelation term is a free hyperparameter whose value is not derived.

free parameters (1)

decorrelation regularizer weight
Hyperparameter that balances the decorrelation loss against classification and consistency losses; its value must be chosen or tuned.

axioms (1)

domain assumption Correlated feature dimensions are the fundamental cause of diffused and unreliable gradient attributions in saliency maps
Invoked in the opening argument as the structural limitation being addressed.

pith-pipeline@v0.9.0 · 5514 in / 1151 out tokens · 45691 ms · 2026-05-07T16:46:38.974746+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems, pp. 1097–1105, 2012

2012
[2]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018

work page internal anchor Pith review arXiv 2018
[3]

The mythos of model interpretability,

Z. C. Lipton, “The mythos of model interpretability,”Queue, vol. 16, no. 3, pp. 31–57, 2018

2018
[4]

Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,

R. Caruana, Y . Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” inProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730, 2015

2015
[5]

How to explain individual classification decisions,

D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M ¨uller, “How to explain individual classification decisions,” Journal of Machine Learning Research, vol. 11, pp. 1803–1831, 2010

2010
[6]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inInternational Conference on Machine Learning, pp. 3319– 3328, PMLR, 2017

2017
[7]

Sanity checks for saliency maps,

J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” inAdvances in Neural Information Processing Systems, pp. 9505–9515, 2018

2018
[8]

Decorrelated batch normal- ization,

L. Huang, D. Yang, B. Lang, and J. Deng, “Decorrelated batch normal- ization,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 791–800, 2018

2018
[9]

Batch normalization: Accelerating deep network training by reducing internal covariate shift,

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” inInternational conference on machine learning, pp. 448–456, PMLR, 2015

2015
[10]

Smooth- grad: removing noise by adding noise,

D. Smilkov, N. Thorat, B. Kim, F. Vi ´egas, and M. Wattenberg, “Smooth- grad: removing noise by adding noise,” inWorkshop on Visualization for Deep Learning, ICML, 2017

2017
[11]

Improving deep learning inter- pretability by saliency guided training,

A. A. Ismail, S. Feizi, and H. C. Bravo, “Improving deep learning inter- pretability by saliency guided training,”Advances in Neural Information Processing Systems, vol. 34, pp. 13890–13903, 2021

2021
[12]

Guided inte- grated gradients: An adaptive path method for removing noise,

A. Kapishnikov, T. Bolukbasi, F. Vi ´egas, and M. Terry, “Guided inte- grated gradients: An adaptive path method for removing noise,”arXiv preprint arXiv:2106.09650, 2021

work page arXiv 2021
[13]

Smoot: Saliency guided mask optimized online training,

A. Karkehabadi, H. Homayoun, and A. Sasan, “Smoot: Saliency guided mask optimized online training,” in2024 IEEE 17th Dallas circuits and systems conference (DCAS), pp. 1–6, IEEE, 2024

2024
[14]

Hlgm: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking,

A. Karkehabadi, B. S. Latibari, H. Homayoun, and A. Sasan, “Hlgm: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking,” in2024 14th International Conference on Information Science and Technology (ICIST), pp. 909–917, IEEE, 2024

2024
[15]

Learning important features through propagating activation differences,

A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” inInternational Conference on Machine Learning, pp. 3145–3153, PMLR, 2017

2017
[16]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inAdvances in Neural Information Processing Systems, pp. 4765–4774, 2017

2017
[17]

Interpretation of neural networks is fragile,

A. Ghorbani, A. Abid, and J. Zou, “Interpretation of neural networks is fragile,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3681–3688, 2019

2019
[18]

Re- ducing overfitting in deep networks by decorrelating representations,

M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra, “Re- ducing overfitting in deep networks by decorrelating representations,” inInternational Conference on Learning Representations, 2016

2016
[19]

On feature decorrelation in self-supervised learning,

T. Hua, W. Wang, Z. Xue, S. Ren, Y . Wang, and H. Zhao, “On feature decorrelation in self-supervised learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9598–9608, 2021

2021
[20]

Barlow twins: Self-supervised learning via redundancy reduction,

J. Zbontar, L. Jing, I. Misra, Y . LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” inInternational Conference on Machine Learning, pp. 12310–12320, PMLR, 2021

2021
[21]

Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation,

J. Hassanpour, V . Srivastav, D. Mutter, and N. Padoy, “Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation,” pp. 1–5, 05 2024

2024
[22]

Right for the right rea- sons: Training differentiable models by constraining their explanations,

A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right rea- sons: Training differentiable models by constraining their explanations,” inProceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2662–2670, 2017

2017
[23]

Saliency learning: Teaching the model where to pay attention,

R. Ghaeini, X. Z. Fern, and P. Tadepalli, “Saliency learning: Teaching the model where to pay attention,”arXiv preprint arXiv:1902.08649, 2019

work page arXiv 1902
[24]

Learning deep features for discriminative localization,

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929, 2016

2016
[25]

Distilling a Neural Network Into a Soft Decision Tree

N. Frosst and G. Hinton, “Distilling a neural network into a soft decision tree,”arXiv preprint arXiv:1711.09784, 2017

work page Pith review arXiv 2017
[26]

A connection between adversarial robustness and saliency map interpretability,

C. Etmann, S. Lunz, P. Maass, and C. Sch ¨onlieb, “A connection between adversarial robustness and saliency map interpretability,” inInternational Conference on Machine Learning, pp. 1823–1832, PMLR, 2019

2019
[27]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

2002
[28]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hinton,et al., “Learning multiple layers of features from tiny images,” 2009

2009
[29]

Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,

L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” in2004 conference on computer vision and pattern recognition workshop, pp. 178–178, IEEE, 2004

2004
[30]

Ima- genet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Ima- genet: A large-scale hierarchical image database,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009

2009
[31]

Gradient regulariza- tion improves accuracy of discriminative models,

D. Varga, A. Csisz ´arik, and Z. Zombori, “Gradient regulariza- tion improves accuracy of discriminative models,”arXiv preprint arXiv:1712.09936, 2017

work page arXiv 2017
[32]

Integrated gradient correlation: a dataset- wise attribution method,

P. Leli `evre and C.-C. Chen, “Integrated gradient correlation: a dataset- wise attribution method,”arXiv preprint arXiv:2404.13910, 2024

work page arXiv 2024
[33]

Scaat: Improving neural network interpretability via saliency constrained adaptive adversarial training,

R. Xu, W. Qin, P. Huang, H. Wang, and L. Luo, “Scaat: Improving neural network interpretability via saliency constrained adaptive adversarial training,”arXiv preprint arXiv:2311.05143, 2023

work page arXiv 2023