Recognition: unknown
SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation
Pith reviewed 2026-05-07 16:46 UTC · model grok-4.3
The pith
Enforcing feature decorrelation during training sharpens gradient-based saliency maps and improves model accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By jointly optimizing classification accuracy, prediction consistency under feature masking, and a decorrelation regularizer that pushes learned features toward orthogonality, the method concentrates gradient flow so that standard saliency techniques produce substantially sharper and more object-focused maps, all while delivering measurable accuracy improvements across multiple datasets and architectures and without introducing inference-time overhead or architectural modifications.
What carries the argument
The decorrelation regularizer, added to a joint training objective that also includes classification and masking consistency losses, which reshapes the feature space toward orthogonality to concentrate attribution gradients.
If this is right
- Gradient-based saliency methods become more faithful without any modification to the saliency algorithm itself.
- Interpretability gains occur together with, rather than in opposition to, predictive performance gains.
- The same trained model can be used for both higher-accuracy prediction and higher-quality explanations at no extra cost.
- The improvement holds across multiple standard vision datasets and common network architectures.
Where Pith is reading between the lines
- The same decorrelation principle could be tested on non-gradient attribution methods to check whether representation geometry affects explanation quality more broadly.
- Applying the regularizer only during fine-tuning rather than from scratch might preserve pre-trained features while still sharpening saliency on downstream tasks.
- If decorrelated features reduce redundancy, the approach may also improve robustness to adversarial perturbations that exploit correlated directions.
- One could measure whether the degree of achieved orthogonality correlates directly with saliency sharpness on held-out data as a simple diagnostic.
Load-bearing premise
That correlated feature dimensions are the dominant cause of diffuse gradients and that adding the decorrelation term will reliably focus saliency without creating new biases or failure modes.
What would settle it
Training a model with the full SaliencyDecor objective on a standard benchmark and then finding that its saliency maps remain as noisy and background-focused as the baseline while accuracy stays flat or declines would falsify the central claim.
Figures
read the original abstract
Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of this behavior lies in the geometry of learned representations: correlated feature dimensions diffuse attribution gradients across redundant directions, resulting in blurred and unreliable saliency maps. To address this issue, we identify feature correlation as a structural limitation of gradient-based interpretability and propose SaliencyDecor, a training framework that enforces feature decorrelation to improve attribution fidelity without modifying saliency methods or model architectures by reshaping the feature space toward orthogonality, our approach promotes more concentrated gradient flow and improves the fidelity of saliency-based explanations. SaliencyDecor jointly optimizes classification, prediction consistency under feature masking, and a decorrelation regularizer, requiring no architectural changes or inference-time overhead. Extensive experiments across multiple benchmarks and architectures demonstrate that our method produces substantially sharper and more object-focused saliency maps while simultaneously improving predictive performance, achieving accuracy gains across the datasets. These results establish our method as a principled mechanism for enhancing both interpretability and accuracy, challenging the conventional trade-off between explanation quality and model performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that correlated feature dimensions in neural networks diffuse attribution gradients, producing noisy saliency maps. It proposes SaliencyDecor, a training framework that augments the classification objective with a prediction-consistency term under feature masking and a decorrelation regularizer to enforce orthogonal representations. The method requires no architectural changes or inference overhead and is claimed to yield substantially sharper, more object-focused gradient-based saliency maps while also improving predictive accuracy across multiple benchmarks and architectures.
Significance. If the central claims are substantiated, the work would be significant for interpretability research: it offers a training-time intervention that simultaneously targets explanation fidelity and task performance by reshaping the geometry of the learned feature space, without the usual cost of post-hoc methods or architectural redesign. The absence of inference-time overhead and the joint optimization of accuracy and consistency are practical strengths.
major comments (2)
- [Experiments] The experimental section provides no ablation that removes only the decorrelation regularizer while retaining the masking-consistency term. Because the reported gains in saliency sharpness and accuracy are obtained under the joint loss, it is impossible to attribute the improvements specifically to feature decorrelation rather than to the additional regularization or optimization dynamics introduced by the masking objective.
- [Abstract and Experiments] The abstract and results claim 'substantially sharper and more object-focused saliency maps' together with 'accuracy gains across the datasets,' yet supply no quantitative metrics, baseline comparisons, ablation tables, or error bars. Without these details the central empirical assertion cannot be evaluated.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. The comments highlight important aspects of experimental rigor that we will address in the revision. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Experiments] The experimental section provides no ablation that removes only the decorrelation regularizer while retaining the masking-consistency term. Because the reported gains in saliency sharpness and accuracy are obtained under the joint loss, it is impossible to attribute the improvements specifically to feature decorrelation rather than to the additional regularization or optimization dynamics introduced by the masking objective.
Authors: We agree that the current experiments do not isolate the contribution of the decorrelation regularizer. To directly address this, we will add a new ablation in the revised manuscript: models trained using only the prediction-consistency term under feature masking (without the decorrelation loss) will be compared against both the standard baseline and the full SaliencyDecor objective. This will allow us to quantify how much of the observed improvement in saliency sharpness and accuracy is attributable to enforcing orthogonality in the feature space versus the masking-based consistency term alone. revision: yes
-
Referee: [Abstract and Experiments] The abstract and results claim 'substantially sharper and more object-focused saliency maps' together with 'accuracy gains across the datasets,' yet supply no quantitative metrics, baseline comparisons, ablation tables, or error bars. Without these details the central empirical assertion cannot be evaluated.
Authors: We acknowledge that the abstract and experimental presentation would be strengthened by explicit quantitative support. While accuracy improvements are reported as averages over multiple random seeds in the full manuscript, we will revise the abstract to avoid overstatement and add a dedicated results table that includes: (i) quantitative saliency sharpness metrics (such as average entropy of the saliency maps and, where object annotations are available, overlap with ground-truth regions), (ii) direct comparisons against standard baselines, (iii) the requested ablation table, and (iv) error bars or standard deviations for all metrics. These additions will make the empirical claims fully evaluable. revision: yes
Circularity Check
No circularity: explicit regularizers and joint loss are independent of claimed outputs
full rationale
The paper proposes SaliencyDecor as a new training objective that jointly optimizes classification loss, a masking consistency term, and an explicit decorrelation regularizer. This construction is presented directly as the method rather than as a derivation that reduces any prediction or saliency improvement to a quantity already fitted inside the same equations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The claimed sharper saliency maps and accuracy gains are positioned as empirical outcomes of the added terms, which remain externally testable via ablation or reproduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- decorrelation regularizer weight
axioms (1)
- domain assumption Correlated feature dimensions are the fundamental cause of diffused and unreliable gradient attributions in saliency maps
Reference graph
Works this paper leans on
-
[1]
Imagenet classification with deep convolutional neural networks,
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems, pp. 1097–1105, 2012
2012
-
[2]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review arXiv 2018
-
[3]
The mythos of model interpretability,
Z. C. Lipton, “The mythos of model interpretability,”Queue, vol. 16, no. 3, pp. 31–57, 2018
2018
-
[4]
Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,
R. Caruana, Y . Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” inProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730, 2015
2015
-
[5]
How to explain individual classification decisions,
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M ¨uller, “How to explain individual classification decisions,” Journal of Machine Learning Research, vol. 11, pp. 1803–1831, 2010
2010
-
[6]
Axiomatic attribution for deep networks,
M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inInternational Conference on Machine Learning, pp. 3319– 3328, PMLR, 2017
2017
-
[7]
Sanity checks for saliency maps,
J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” inAdvances in Neural Information Processing Systems, pp. 9505–9515, 2018
2018
-
[8]
Decorrelated batch normal- ization,
L. Huang, D. Yang, B. Lang, and J. Deng, “Decorrelated batch normal- ization,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 791–800, 2018
2018
-
[9]
Batch normalization: Accelerating deep network training by reducing internal covariate shift,
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” inInternational conference on machine learning, pp. 448–456, PMLR, 2015
2015
-
[10]
Smooth- grad: removing noise by adding noise,
D. Smilkov, N. Thorat, B. Kim, F. Vi ´egas, and M. Wattenberg, “Smooth- grad: removing noise by adding noise,” inWorkshop on Visualization for Deep Learning, ICML, 2017
2017
-
[11]
Improving deep learning inter- pretability by saliency guided training,
A. A. Ismail, S. Feizi, and H. C. Bravo, “Improving deep learning inter- pretability by saliency guided training,”Advances in Neural Information Processing Systems, vol. 34, pp. 13890–13903, 2021
2021
-
[12]
Guided inte- grated gradients: An adaptive path method for removing noise,
A. Kapishnikov, T. Bolukbasi, F. Vi ´egas, and M. Terry, “Guided inte- grated gradients: An adaptive path method for removing noise,”arXiv preprint arXiv:2106.09650, 2021
-
[13]
Smoot: Saliency guided mask optimized online training,
A. Karkehabadi, H. Homayoun, and A. Sasan, “Smoot: Saliency guided mask optimized online training,” in2024 IEEE 17th Dallas circuits and systems conference (DCAS), pp. 1–6, IEEE, 2024
2024
-
[14]
Hlgm: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking,
A. Karkehabadi, B. S. Latibari, H. Homayoun, and A. Sasan, “Hlgm: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking,” in2024 14th International Conference on Information Science and Technology (ICIST), pp. 909–917, IEEE, 2024
2024
-
[15]
Learning important features through propagating activation differences,
A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” inInternational Conference on Machine Learning, pp. 3145–3153, PMLR, 2017
2017
-
[16]
A unified approach to interpreting model predictions,
S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inAdvances in Neural Information Processing Systems, pp. 4765–4774, 2017
2017
-
[17]
Interpretation of neural networks is fragile,
A. Ghorbani, A. Abid, and J. Zou, “Interpretation of neural networks is fragile,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3681–3688, 2019
2019
-
[18]
Re- ducing overfitting in deep networks by decorrelating representations,
M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra, “Re- ducing overfitting in deep networks by decorrelating representations,” inInternational Conference on Learning Representations, 2016
2016
-
[19]
On feature decorrelation in self-supervised learning,
T. Hua, W. Wang, Z. Xue, S. Ren, Y . Wang, and H. Zhao, “On feature decorrelation in self-supervised learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9598–9608, 2021
2021
-
[20]
Barlow twins: Self-supervised learning via redundancy reduction,
J. Zbontar, L. Jing, I. Misra, Y . LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” inInternational Conference on Machine Learning, pp. 12310–12320, PMLR, 2021
2021
-
[21]
Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation,
J. Hassanpour, V . Srivastav, D. Mutter, and N. Padoy, “Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation,” pp. 1–5, 05 2024
2024
-
[22]
Right for the right rea- sons: Training differentiable models by constraining their explanations,
A. S. Ross, M. C. Hughes, and F. Doshi-Velez, “Right for the right rea- sons: Training differentiable models by constraining their explanations,” inProceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2662–2670, 2017
2017
-
[23]
Saliency learning: Teaching the model where to pay attention,
R. Ghaeini, X. Z. Fern, and P. Tadepalli, “Saliency learning: Teaching the model where to pay attention,”arXiv preprint arXiv:1902.08649, 2019
-
[24]
Learning deep features for discriminative localization,
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929, 2016
2016
-
[25]
Distilling a Neural Network Into a Soft Decision Tree
N. Frosst and G. Hinton, “Distilling a neural network into a soft decision tree,”arXiv preprint arXiv:1711.09784, 2017
work page Pith review arXiv 2017
-
[26]
A connection between adversarial robustness and saliency map interpretability,
C. Etmann, S. Lunz, P. Maass, and C. Sch ¨onlieb, “A connection between adversarial robustness and saliency map interpretability,” inInternational Conference on Machine Learning, pp. 1823–1832, PMLR, 2019
2019
-
[27]
Gradient-based learning applied to document recognition,
Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002
2002
-
[28]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hinton,et al., “Learning multiple layers of features from tiny images,” 2009
2009
-
[29]
Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,
L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” in2004 conference on computer vision and pattern recognition workshop, pp. 178–178, IEEE, 2004
2004
-
[30]
Ima- genet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Ima- genet: A large-scale hierarchical image database,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
2009
-
[31]
Gradient regulariza- tion improves accuracy of discriminative models,
D. Varga, A. Csisz ´arik, and Z. Zombori, “Gradient regulariza- tion improves accuracy of discriminative models,”arXiv preprint arXiv:1712.09936, 2017
-
[32]
Integrated gradient correlation: a dataset- wise attribution method,
P. Leli `evre and C.-C. Chen, “Integrated gradient correlation: a dataset- wise attribution method,”arXiv preprint arXiv:2404.13910, 2024
-
[33]
R. Xu, W. Qin, P. Huang, H. Wang, and L. Luo, “Scaat: Improving neural network interpretability via saliency constrained adaptive adversarial training,”arXiv preprint arXiv:2311.05143, 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.