pith. machine review for the scientific record. sign in

arxiv: 2605.06368 · v1 · submitted 2026-05-07 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:16 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords distribution shiftsspurious correlationsGrad-CAMrobustnessvisual explanationsregularizationdomain invarianceconfounders
0
0 comments X

The pith

eX2L decorrelates confounding features by penalizing Grad-CAM map similarity between primary and confounder classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an explanation-based regularization method called eX2L that trains a primary label classifier alongside a confounder classifier and adds a loss penalizing similarity in their Grad-CAM activation maps. This penalization is intended to push confounding features out of the latent representations used for the main task. A sympathetic reader would care because current approaches to distribution shifts often underperform simple baselines or lack direct interpretability. If the mechanism holds, it would allow models to reach higher worst-group accuracy on benchmarks with spurious correlations by explicitly decoupling label and nuisance attributes at the group level. The result is presented as a path to functional domain invariance without indirect or overly complex interventions.

Core claim

eX2L decorrelates confounding features from a classifier's latent representations during training by penalizing the similarity between Grad-CAM activation maps generated by a primary label classifier and those from a concurrently trained confounder classifier. On the Spawrious Many-to-Many Hard Challenge benchmark, this yields an average accuracy of 82.24% and a worst-group accuracy of 66.31%, exceeding prior state-of-the-art by 5.49% and 10.90%. The work shows that functional domain invariance can be achieved by explicitly decoupling label and nuisance attributes at the group level.

What carries the argument

The central mechanism is contrastive penalization of Grad-CAM activation maps from the primary label classifier and the parallel confounder classifier to enforce dissimilarity in how each attends to image regions.

If this is right

  • Classifiers achieve higher average and worst-group accuracy on distribution-shift benchmarks by reducing dependence on spurious features.
  • Functional domain invariance follows from explicit group-level decoupling of label and nuisance attributes.
  • The framework supplies built-in interpretability because training directly manipulates visual explanation maps.
  • The approach outperforms empirical risk minimization and prior methods on the tested many-to-many hard shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same paired-classifier structure could be tested with other explanation techniques to check whether the gains depend on Grad-CAM specifically.
  • The explicit separation of attributes may simplify post-training bias audits in deployed vision systems.
  • Adding the confounder branch increases training cost, which would need to be weighed against robustness gains in resource-limited settings.
  • The method invites experiments on whether the same decorrelation principle transfers to non-image modalities that have their own explanation tools.

Load-bearing premise

Penalizing similarity between Grad-CAM activation maps of the primary label classifier and the confounder classifier will reliably decorrelate confounding features from the latent representations without harming the primary task or introducing new unintended correlations.

What would settle it

If ablating the contrastive penalization term on the Spawrious Many-to-Many Hard Challenge produces no improvement in worst-group accuracy over standard training, or if the confounder classifier does not capture the relevant nuisance attributes, the proposed decorrelation mechanism would be falsified.

Figures

Figures reproduced from arXiv: 2605.06368 by Jose Marie Antonio Mi\~noza, Paulo Mario P. Medina, Sebastian C. Iba\~nez.

Figure 1
Figure 1. Figure 1: Grad-CAM plots of eX2L and others with their corresponding true labels y, predicted labels yˆ, and true confounders c. eX2L’s 11.91% improvement in worst-group accuracy against GroupDRO can be directly attributed to the mechanical shift visualized in the last row where GroupDRO’s and ERM’s attention is diffused across the background: eX2L restricts focus exclusively to the dog’s ear, effectively ignoring t… view at source ↗
Figure 2
Figure 2. Figure 2: UMAP Plots of different algorithms’ latent representations on the Waterbirds dataset. (a) maps the color of each point by the label while (b) maps the color of each point by the confounder. The compactness, clear label separation, and lack of confounder reliance of the representations supports the observed targeted visual focus by eX2L as seen in view at source ↗
Figure 3
Figure 3. Figure 3: UMAP Plots of different algorithms’ latent representations on the Hard Many-to-Many Spawrious dataset. Colors map the three defined training and test environments. While not explicitly using environmental annotations, eX2L demonstrates better domain invariance than DANN, as evidenced by its lower MMDEnv and more highly interspersed environments. 5.2. Visual Interpretability Analysis Based on view at source ↗
read the original abstract

Despite extensive research into mitigating distribution shifts, many existing algorithms yield inconsistent performance, often failing to outperform baseline Empirical Risk Minimization (ERM) across diverse scenarios. Furthermore, high algorithmic complexity frequently limits interpretability and offers only an indirect means of addressing spurious correlations. We propose eXplaining to Learn (eX2L): an interpretable, explanation-based framework that decorrelates confounding features from a classifier's latent representations during training. eX2L achieves this by penalizing the similarity between Grad-CAM activation maps generated by a primary label classifier and those from a concurrently trained confounder classifier. On the rigorous Spawrious Many-to-Many Hard Challenge benchmark, eX2L achieves an average accuracy (AA) of 82.24% +/- 3.87% and a worst-group accuracy (WGA) of 66.31% +/- 8.73%, outperforming the current state-of-the-art (SOTA) by 5.49% and 10.90%, respectively. Beyond its competitive performance, eX2L demonstrates that functional domain invariance can be achieved by explicitly decoupling label and nuisance attributes at the group level.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes eX2L, an interpretable regularization method for distribution shifts that trains a primary label classifier alongside a confounder classifier and adds a penalty on the similarity of their Grad-CAM activation maps. This is claimed to decorrelate confounding features from the primary classifier's latent representations. On the Spawrious Many-to-Many Hard Challenge, eX2L reports average accuracy of 82.24% ± 3.87% and worst-group accuracy of 66.31% ± 8.73%, outperforming prior SOTA by 5.49% and 10.90% respectively.

Significance. If the regularization mechanism reliably decorrelates confounders from latent representations without introducing new correlations, the approach would offer a low-complexity, explanation-driven alternative to existing robustness methods. The reported benchmark gains on a rigorous many-to-many shift task would be practically relevant for computer vision applications where spurious correlations are common.

major comments (2)
  1. [Abstract] Abstract: the central claim that penalizing Grad-CAM map similarity 'decorrelates confounding features from a classifier's latent representations' is not supported by the described construction. Grad-CAM produces input-space saliency maps from gradients w.r.t. the final convolutional feature maps; the penalty therefore only encourages the two heads to attend to different spatial regions of the input. Nothing in the formulation constrains the post-convolutional latent embeddings themselves, leaving open the possibility that the primary encoder still encodes confounder information in a form invisible to the Grad-CAM head.
  2. [Abstract] Abstract and results section: the reported gains (AA 82.24% ± 3.87%, WGA 66.31% ± 8.73%) are presented without any ablation on the explanation-similarity penalty weight, without verification that the penalty term actually reduces correlation between latent features and confounders, and without implementation details sufficient to reproduce the numbers. These omissions make it impossible to confirm that the claimed decorrelation mechanism is responsible for the observed improvements.
minor comments (2)
  1. The manuscript should include a clear statement of the exact loss formulation, including how the confounder classifier is trained and how the penalty is weighted relative to the primary task loss.
  2. Error bars are reported but the number of runs and random seeds are not stated; this information is needed to assess the statistical significance of the 5.49% and 10.90% improvements over SOTA.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the presentation of the method and its empirical validation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that penalizing Grad-CAM map similarity 'decorrelates confounding features from a classifier's latent representations' is not supported by the described construction. Grad-CAM produces input-space saliency maps from gradients w.r.t. the final convolutional feature maps; the penalty therefore only encourages the two heads to attend to different spatial regions of the input. Nothing in the formulation constrains the post-convolutional latent embeddings themselves, leaving open the possibility that the primary encoder still encodes confounder information in a form invisible to the Grad-CAM head.

    Authors: We appreciate the referee's precise analysis of the mechanism. The penalty operates on Grad-CAM saliency maps derived from gradients with respect to the final convolutional feature maps, which encourages the primary classifier to attend to different spatial regions than the confounder classifier. While this does not impose an explicit constraint on every possible encoding within the latent feature maps, the resulting difference in attention patterns is intended to reduce the primary classifier's reliance on confounding features for its predictions. We acknowledge that the original wording in the abstract overstates the direct effect on latent representations. In the revision we will rephrase the central claim to emphasize that the approach 'encourages the primary classifier to rely on label-relevant spatial features by penalizing overlap in explanation maps with a confounder classifier' and will add a short discussion of the indirect influence on feature usage. revision: partial

  2. Referee: [Abstract] Abstract and results section: the reported gains (AA 82.24% ± 3.87%, WGA 66.31% ± 8.73%) are presented without any ablation on the explanation-similarity penalty weight, without verification that the penalty term actually reduces correlation between latent features and confounders, and without implementation details sufficient to reproduce the numbers. These omissions make it impossible to confirm that the claimed decorrelation mechanism is responsible for the observed improvements.

    Authors: We agree that these supporting analyses are necessary to substantiate the role of the explanation-similarity penalty. In the revised manuscript we will add (i) an ablation study sweeping the penalty weight and reporting its effect on both average and worst-group accuracy, (ii) quantitative verification that the penalty reduces correlation between the primary classifier's latent features and confounder labels (e.g., via linear probing accuracy or estimated mutual information), and (iii) expanded implementation details, including exact hyper-parameters, network architectures, and training schedules, placed in the main text or a dedicated reproducibility section. These additions will allow readers to confirm that the observed gains are attributable to the proposed regularization. revision: yes

Circularity Check

0 steps flagged

No circularity: method explicitly defined via penalty on Grad-CAM similarity; performance claims are empirical measurements, not derived quantities.

full rationale

The paper defines eX2L directly as a regularization term that penalizes similarity between Grad-CAM activation maps of the primary classifier and a concurrent confounder classifier. This construction is stated as the mechanism for decorrelating confounders from latent representations, but the link is presented as a modeling choice rather than a mathematical reduction. No equations show a fitted parameter or self-referential definition that forces the target metric by construction. Reported accuracies on Spawrious are measured outcomes on held-out data, not predictions derived from the same inputs used to fit the model. No self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text as load-bearing steps. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that Grad-CAM maps faithfully represent the features driving each classifier's decisions and that reducing map similarity equates to reduced feature correlation in the latent space.

free parameters (1)
  • explanation similarity penalty weight
    The strength of the contrastive penalty term must be chosen or tuned; its value is not reported in the abstract.
axioms (1)
  • domain assumption Grad-CAM activation maps accurately reflect the decision-relevant features of each classifier
    The entire regularization strategy depends on this property of Grad-CAM.

pith-pipeline@v0.9.0 · 5534 in / 1398 out tokens · 35990 ms · 2026-05-08T13:16:09.066032+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 13 canonical work pages · 3 internal anchors

  1. [1]

    Sanity checks for saliency maps

    Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. Sanity checks for saliency maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp.\ 9525–9536, Red Hook, NY, USA, 2018. Curran Associates Inc

  2. [2]

    Back-to-bones: Rediscovering the Role of Backbones in Domain Generalization

    Angarano, S., Martini, M., Salvetti, F., Mazzia, V., and Chiaberge, M. Back-to-bones: Rediscovering the Role of Backbones in Domain Generalization . Pattern Recognition, 156: 0 110762, 2024. doi:10.1016/j.patcog.2024.110762

  3. [3]

    Invariant Risk Minimization

    Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019. URL https://arxiv.org/abs/1907.02893

  4. [4]

    Dammu, P. P. S. and Shah, C. Detecting spurious correlations via robust visual concepts in real and AI -generated image classification. In XAI in Action: Past, Present, and Future Applications, 2023. URL https://openreview.net/forum?id=ewagDhIy8Y

  5. [5]

    Domain-adversarial training of neural networks

    Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., and Lempitsky, V. Domain-adversarial training of neural networks. Journal of machine learning research, 17 0 (59): 0 1--35, 2016

  6. [6]

    M., Rasch, M

    Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. A kernel two-sample test. Journal of Machine Learning Research, 13 0 (25): 0 723--773, 2012. URL http://jmlr.org/papers/v13/gretton12a.html

  7. [7]

    and Lopez-Paz, D

    Gulrajani, I. and Lopez-Paz, D. In search of lost domain generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lQdXeXDoWtI

  8. [8]

    T., Curran, K

    Hagos, M. T., Curran, K. M., and Namee, B. M. Identifying spurious correlations and correcting them with an explanation-based learning, 2022. URL https://arxiv.org/abs/2211.08285

  9. [9]

    Han and Y

    Han, X. and Tsvetkov, Y. Influence tuning: Demoting spurious correlations via instance attribution and instance-driven updates. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp.\ 4398--4409, Punta Cana, Dominican Republic, November 2021. Association for Computationa...

  10. [10]

    Kirichenko, P., Izmailov, P., and Wilson, A. G. Last layer re-training is sufficient for robustness to spurious correlations. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ylpMUNYWpX

  11. [11]

    W., Sagawa, S., Marklund, H., Xie, S

    Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips, R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W., Earnshaw, B. A., Haque, I. S., Beery, S. M., Leskovec, J., Kundaje, A. B., Pierson, E., Levine, S., Finn, C., and Liang, P. Wilds: A benchmark of in-the-wild distribution shifts. In Inter...

  12. [12]

    J., Wang, S., and Kot, A

    Li, H., Pan, S. J., Wang, S., and Kot, A. C. Domain generalization with adversarial feature learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 5400--5409, 2018. doi:10.1109/CVPR.2018.00566

  13. [13]

    Z., Haghgoo, B., Chen, A

    Liu, E. Z., Haghgoo, B., Chen, A. S., Raghunathan, A., Koh, P. W., Sagawa, S., Liang, P., and Finn, C. Just train twice: Improving group robustness without training group information. In Proceedings of the 38th International Conference on Machine Learning, pp.\ 6781--6792. PMLR, 2021 a

  14. [14]

    Towards out-of-distribution generalization: A survey.arXiv preprint arXiv:2108.13624, 2021

    Liu, J., Shen, Z., He, Y., Zhang, X., Xu, R., Yu, H., and Cui, P. Towards out-of-distribution generalization: A survey, 2021 b . URL https://arxiv.org/abs/2108.13624

  15. [15]

    Challenging common assumptions in the unsupervised learning of disentangled representations

    Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch \"o lkopf, B., and Bachem, O. Challenging common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning, pp.\ 4114--4124, 2019

  16. [16]

    Long, M., Cao, Z., Wang, J., and Jordan, M. I. Conditional adversarial domain adaptation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp.\ 1647–1657, Red Hook, NY, USA, 2018. Curran Associates Inc

  17. [17]

    J.-S., Kaddour, J., and Silva, R

    Lynch, A., Dovonon, G. J.-S., Kaddour, J., and Silva, R. Spawrious: A benchmark for fine control of spurious correlation biases. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=0S0oITNTCz

  18. [18]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    McInnes, L., Healy, J., and Melville, J. UMAP : Uniform manifold approximation and projection for dimension reduction, 2018. URL https://arxiv.org/abs/1802.03426

  19. [19]

    2016, in 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 565–571, doi: 10.1109/3DV.2016.79

    Milletari, F., Navab, N., and Ahmadi, S.-A. V-Net : Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp.\ 565--571, 2016. doi:10.1109/3DV.2016.79

  20. [20]

    Proceedings of the AAAI Conference on Artificial Intelligence , author =

    Ming, Y., Yin, H., and Li, Y. On the impact of spurious correlation for out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.\ 10051--10059, 2022. doi:10.1609/aaai.v36i9.21244

  21. [21]

    Mitigating spurious correlations in image recognition models using performance-based feature sampling

    Monga, A., Somou, R., Zhang, S., and Ortega, A. Mitigating spurious correlations in image recognition models using performance-based feature sampling. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=DRv8wcssgs

  22. [22]

    Learning from failure: De-biasing Classifier from Biased Classifier

    Nam, J., Cha, H., Ahn, S., Lee, J., and Shin, J. Learning from failure: De-biasing Classifier from Biased Classifier . In Advances in Neural Information Processing Systems, volume 33, pp.\ 20673--20684, 2020

  23. [23]

    Rahman, M. A. and Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., Scheidegger, C., and Isenberg, T. (eds.), Advances in Visual Computing, pp.\ 234--244, Cham, 2016. Springer International Publi...

  24. [24]

    Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

    Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization, 2019. URL https://arxiv.org/abs/1911.08731

  25. [25]

    W., Hashimoto, T

    Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS

  26. [26]

    R.et al.Grad-cam: Visual explanations from deep networks via gradient-based localization.International Journal of Computer Vision128, 336–359 (2019)

    Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128 0 (2): 0 336–359, October 2019. ISSN 1573-1405. doi:10.1007/s11263-019-01228-7. URL http://dx.doi.org/10.1007/s11263-019-01228-7

  27. [27]

    and Zhao, Z

    Shen, H. and Zhao, Z. Boosting test performance with importance sampling-a subpopulation perspective. In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI'25/IAAI'25/E...

  28. [28]

    Shortcut learning susceptibility in vision classifiers

    Suhail, P., Goel, V., and Sethi, A. Shortcut learning susceptibility in vision classifiers. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=dvafjL2zXP

  29. [29]

    and Saenko, K

    Sun, B. and Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Hua, G. and J \'e gou, H. (eds.), Computer Vision -- ECCV 2016 Workshops, pp.\ 443--450, Cham, 2016. Springer International Publishing. ISBN 978-3-319-49409-8

  30. [30]

    and Zaslavsky, N

    Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop, pp.\ 1--5, 2015

  31. [31]

    Image quality assessment: from error visibility to structural similarity

    Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13 0 (4): 0 600--612, 2004. doi:10.1109/TIP.2003.819861

  32. [32]

    D., and Cemgil, A

    Wiles, O., Gowal, S., Stimberg, F., Rebuffi, S.-A., Ktena, I., Dvijotham, K. D., and Cemgil, A. T. A fine-grained analysis on distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dl4LetuLdyK

  33. [33]

    Change is hard: A closer look at subpopulation shift

    Yang, Y., Zhang, H., Katabi, D., and Ghassemi, M. Change is hard: A closer look at subpopulation shift. In International Conference on Machine Learning, 2023

  34. [34]

    Towards a theoretical framework of out-of-distribution generalization

    Ye, H., Xie, C., Cai, T., Li, R., Li, Z., and Wang, L. Towards a theoretical framework of out-of-distribution generalization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.\ 23519--23531. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/...