Attribution via Distributional Paths for Information Revelation

Kieran A. Murphy; Shameen Shrestha

arxiv: 2606.03885 · v1 · pith:LNODPGLEnew · submitted 2026-06-02 · 💻 cs.LG

Attribution via Distributional Paths for Information Revelation

Kieran A. Murphy , Shameen Shrestha This is my paper

Pith reviewed 2026-06-28 11:26 UTC · model grok-4.3

classification 💻 cs.LG

keywords feature attributionintegrated gradientsexplainable AIdistributional pathsprobe distributionscompletenessimage classificationtabular regression

0 comments

The pith

Reveal-IG attributes model predictions by integrating expected output changes along paths through structured probe distributions instead of raw inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard path methods such as Integrated Gradients traverse trajectories directly in input space and integrate the model's raw response at each point along the chosen path. Reveal-IG lifts the path into a space of structured probe distributions that progressively reveal information about the input. Attributions are then computed from changes in the model's expected output along this distributional trajectory. The construction preserves completeness with respect to the expected model response and directly supports multiscale image probes as well as feature-wise uncertainty in tabular data.

Core claim

Reveal-IG defines attribution as the integral of changes in expected model output along a path in the space of probe distributions. By traversing distributions that gradually reveal information rather than raw input values, the method retains the completeness axiom, accommodates multiscale and uncertain probes without equal weighting of all scales, and eliminates the path artifacts that arise when early baseline-adjacent points contribute on equal footing with the input itself.

What carries the argument

The distributional path through a family of structured probe distributions, along which the integral of the gradient of expected model output is taken.

If this is right

Attributions sum exactly to the difference between the expected model output at the baseline distribution and at the input distribution.
Multiscale image probes receive resolution-appropriate weighting without manual adjustment of the path.
Feature-wise uncertainty in tabular data is incorporated directly into the probe distributions rather than treated as post-processing.
Synthetic tests show the method avoids the path artifacts that affect input-space trajectory methods.
Signed attributions remain stable across runs and outperform other methods on sign-aware evaluation metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distributional-path construction could be applied to other path-based explanation techniques to obtain completeness with respect to expectations.
Domains with high input uncertainty, such as sensor data or noisy measurements, could adopt the framework to produce attributions that reflect that uncertainty by design.
Automatic or learned selection of probe families might remove the remaining manual choice of distribution family while preserving the artifact-free property.
Information-theoretic measures of revelation along the path could quantify how much new information each scale or feature contributes to the prediction.

Load-bearing premise

A family of structured probe distributions can be chosen so that the integral along the distributional path produces attributions free of new artifacts and without requiring post-hoc tuning of the probe family for each model or dataset.

What would settle it

A synthetic diagnostic in which Reveal-IG attributions on a controlled example display path artifacts of comparable magnitude to those seen in input-space Integrated Gradients or fail to sum exactly to the change in expected model output.

Figures

Figures reproduced from arXiv: 2606.03885 by Kieran A. Murphy, Shameen Shrestha.

**Figure 1.** Figure 1: (a) SHAP reveals feature information in discrete steps, averaging contributions along all such paths. We introduce Reveal-IG, a method that gradually reveals feature information, traversing a continuous path analogous to the path of Integrated Gradients (IG) in the space of feature values. (b) IG evaluates along a single path from a baseline to the point being explained, x ⋆ . Reveal-IG integrates over a s… view at source ↗

**Figure 2.** Figure 2: Attribution fields in two dimensions. (a) For a selection of functions (left column), we evaluate attributions according to gradients, SHAP, IG, and Reveal-IG. The difference of attribution components, a1 − a2 is displayed as a heatmap. (b) By systematically varying the function, we obtain response curves for the attribution methods. A bump in the function’s output is rotated around the origin and the attr… view at source ↗

**Figure 3.** Figure 3: For a random selection of ImageNet validation images, we show signed saliency maps for [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Attribution over the path. (a) We show the fractional rate of change of the attributions, |∆a(t)|/ P t |∆a(t)|, averaged over 100 images. (b) Contribution to final attribution maps across segments of the trajectory for a randomly selected image. the explained input, rather than being dominated by regions near the baseline. It also echoes the synthetic diagnostics, where early path dependence produced shado… view at source ↗

**Figure 5.** Figure 5: Reveal-IG completeness convergence. The completeness gap, measured as | Pai − (Eqend [f(x)] − Eqstart [f(x)])|, with ai the attribution for component i, for (a) ResNet-50 on eight ImageNet samples and (b) an MLP trained on the California Housing dataset, evaluated for 20 samples. Shaded regions display standard error. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of Reveal-IG with different σ endpoints, including the adaptive (per-image) endpoint, whose values are displayed inside the corresponding attribution maps in the right column. C Extended synthetic attribution results In [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: (a) The manner of feature information revelation compared schematically for SHAP, the subdivided variant discussed in Sec. 3.1, and Reveal-IG. For SHAP, the feature contributions are averaged over all paths (here, two possibilities). For k = 2, one path is shown out of the six possibilities. The insets’ shaded regions visualize the distribution over feature values used for the expectation in that step’s ca… view at source ↗

**Figure 8.** Figure 8: For a random selection of images from the localization set with accompanying ImageNet-S [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: For the same random selection of images from the ImageNet validation set shown in Fig. 3, [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: For the same random selection of images from the ImageNet validation set shown in [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: For the same random selection of images from the ImageNet validation set shown in [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

read the original abstract

Feature attribution methods explain predictions by assigning importance scores to input features. Path-based methods such as Integrated Gradients are especially appealing because they satisfy \textit{completeness}: attributions sum to the change in model output between a reference state and the input. Yet most path methods define this trajectory in input space, explaining a model through pointwise perturbed inputs along a chosen path. An input-space path integrates the model's raw response at each point it passes through, with no control over the resolution at which a feature is queried; the early, baseline-adjacent part of the trajectory contributes to the explanation on equal footing with the input itself. Here, we lift path attribution from input space to a space of structured probe distributions around the example of interest, and call our method Reveal-IG. Rather than traversing raw input values, Reveal-IG progressively reveals information about the input and attributes changes in the model's expected output along this distributional path. The result is a path-attribution framework that retains completeness with respect to the expected model response, and naturally accommodates multiscale image probes and feature-wise uncertainty in tabular data. Synthetic diagnostics show that Reveal-IG avoids path artifacts that affect input-space methods, and across ImageNet classification and tabular regression it produces stable, signed attributions -- leading on metrics that use attribution sign while remaining competitive on the rest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reveal-IG moves Integrated Gradients paths into distributional space, which is a clean reformulation that keeps completeness but hands the method a new free parameter in the probe family.

read the letter

Reveal-IG takes the standard Integrated Gradients construction and replaces the input-space trajectory with one through a space of structured probe distributions. The path now progressively reveals information about the input, and attributions come from changes in the model's expected output along that path. Completeness is retained by design, now with respect to the expectation rather than a single point.

The main novelty is the distributional lift itself. It gives a natural way to handle multiscale probes on images and feature-wise uncertainty on tabular data, and the synthetic checks are meant to confirm it reduces the path artifacts that appear when you integrate directly over raw inputs. The real-data runs on ImageNet classification and tabular regression are reported to yield stable signed attributions that do well on sign-aware metrics while staying competitive elsewhere.

The paper is straightforward about the limitation of input-space paths, where early baseline-adjacent points contribute on equal footing with the actual input. The distributional version gives more control over query resolution, and the math follows the original IG integral without obvious circularity.

The soft spot is the probe distribution family. It is listed as a free parameter, and while the abstract states that a well-behaved family can be chosen without post-hoc tuning or new artifacts, the strength of that claim depends on how robust the results are to different families and whether defaults transfer across tasks. The metric wins would also be easier to assess with more detail on baselines and effect sizes.

This is for people already working on path-based attribution methods who want an alternative to input-space trajectories. The construction is coherent enough to deserve a serious referee, even if the experiments will need close checking on the probe choice and quantitative claims.

Referee Report

2 major / 2 minor

Summary. The paper proposes Reveal-IG, which lifts Integrated Gradients from input-space paths to paths in a space of structured probe distributions. Rather than integrating the model's response along a trajectory of raw inputs, the method integrates changes in expected model output as information about the input is progressively revealed through a family of probe distributions. The central claims are that the resulting attributions satisfy completeness with respect to the expected model response, naturally support multiscale image probes and feature-wise uncertainty, avoid common path artifacts, and produce stable signed attributions that lead on sign-aware metrics while remaining competitive on others, as shown in synthetic diagnostics and experiments on ImageNet classification and tabular regression.

Significance. If the derivations and empirical results hold, Reveal-IG offers a principled way to control the resolution and uncertainty at which features are queried during attribution, addressing a limitation of standard input-space path methods. The preservation of completeness, the handling of multiscale and uncertain data, and the reported stability on sign-aware metrics would constitute a useful technical contribution to the feature attribution literature.

major comments (2)

[Method and Experiments] The central construction relies on the existence of a well-behaved probe distribution family that introduces no new artifacts and requires no post-hoc tuning. The paper should provide a concrete sensitivity analysis (e.g., in the experimental section) showing that attribution rankings and sign-aware metrics remain stable across reasonable choices within the family, or else clarify the selection procedure.
[Synthetic diagnostics] Synthetic diagnostics are invoked to show avoidance of path artifacts, but the specific artifacts tested, the quantitative metrics used, and the comparison baselines must be stated explicitly with numerical results (e.g., in a table) so that the claim can be verified independently.

minor comments (2)

Notation for the distributional path and the expectation operator should be introduced once and used consistently; cross-references to the completeness proof would help readers.
Figure captions should explicitly state the probe family and any hyperparameters used so that the visualizations can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. The comments highlight useful ways to strengthen verifiability. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Method and Experiments] The central construction relies on the existence of a well-behaved probe distribution family that introduces no new artifacts and requires no post-hoc tuning. The paper should provide a concrete sensitivity analysis (e.g., in the experimental section) showing that attribution rankings and sign-aware metrics remain stable across reasonable choices within the family, or else clarify the selection procedure.

Authors: We agree that explicit robustness checks are valuable. In the revised version we will add a dedicated sensitivity analysis subsection (and accompanying table) in the experimental section. It will report attribution ranking stability and sign-aware metric values across a range of probe-family parameters (different multiscale widths and uncertainty levels) on both the ImageNet and tabular tasks, together with a brief description of the default selection rule used in the main experiments. revision: yes
Referee: [Synthetic diagnostics] Synthetic diagnostics are invoked to show avoidance of path artifacts, but the specific artifacts tested, the quantitative metrics used, and the comparison baselines must be stated explicitly with numerical results (e.g., in a table) so that the claim can be verified independently.

Authors: We accept the request for greater explicitness. The revised synthetic-diagnostics section will enumerate the concrete artifacts examined (saturation near baselines, discontinuity at feature boundaries), define the quantitative metrics (e.g., attribution variance under path perturbation, sign-consistency score), list the baselines (standard IG, SmoothGrad, and a random-path variant), and present the numerical results in a compact table. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is a direct mathematical lift of IG to distributional paths

full rationale

The paper defines Reveal-IG by lifting the standard Integrated Gradients path integral from input space to a space of structured probe distributions, with the central claim being that the integral of expected-output changes along this distributional path yields attributions that retain completeness w.r.t. the expected model response. No equation reduces the final attribution to a fitted parameter, a quantity defined only in terms of itself, or a result justified solely by self-citation. The construction is presented as a straightforward change of integration domain that preserves the original completeness axiom without introducing new fitted elements or ansatzes smuggled via prior work. Synthetic diagnostics and empirical comparisons are offered as external validation rather than as part of the derivation itself. This is the most common honest finding for a mathematically coherent extension of an existing method.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a suitable family of probe distributions whose expected-output integrals satisfy completeness; the abstract does not introduce new free parameters or invented entities beyond the standard IG completeness axiom.

free parameters (1)

probe distribution family
Choice of how the structured probe distributions are parameterized around each input; this choice is required to define the path but is not numerically fitted in the abstract.

axioms (1)

domain assumption The integral of changes in expected model output along the distributional path equals the difference between expected output on the full input and on the baseline.
This is the completeness property transferred from input space to distributional space; it is invoked to guarantee that attributions sum correctly.

pith-pipeline@v0.9.1-grok · 5761 in / 1411 out tokens · 61503 ms · 2026-06-28T11:26:23.242983+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 3 canonical work pages

[1]

Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021

Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J Anders, and Klaus-Robert Müller. Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021

2021
[2]

Christoph Molnar.Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2022

2022
[3]

A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

2017
[4]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017

2017
[5]

Visualizing the impact of feature attribution baselines

Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Visualizing the impact of feature attribution baselines. Distill, 2020. doi: 10.23915/distill.00022. https://distill.pub/2020/attribution-baselines

work page doi:10.23915/distill.00022 2020
[6]

Attribution in scale and space

Shawn Xu, Subhashini Venugopalan, and Mukund Sundararajan. Attribution in scale and space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020
[7]

DER: Dynamically Expandable Representation for Class Incremental Learning

Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, and Tolga Bolukbasi. Guided integrated gradients: an adaptive path method for removing noise. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5048–5056, 2021. doi: 10.1109/CVPR46437.2021.00501

work page doi:10.1109/cvpr46437.2021.00501 2021
[8]

Understanding global feature contributions with additive importance measures.Advances in neural information processing systems, 33:17212–17223, 2020

Ian Covert, Scott M Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures.Advances in neural information processing systems, 33:17212–17223, 2020. 11

2020
[9]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations (ICLR), 2014

2014
[10]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

2009
[11]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015

2015
[12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016
[13]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

2021
[14]

Integrated decision gradients: Compute your attributions where the model makes its decision

Chase Walker, Sumit Jha, Kenny Chen, and Rickard Ewetz. Integrated decision gradients: Compute your attributions where the model makes its decision. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5289–5297, 2024

2024
[15]

Improving performance of deep learning models with axiomatic attribution priors and expected gradients.Nature machine intelligence, 3(7):620–631, 2021

Gabriel Erion, Joseph D Janizek, Pascal Sturmfels, Scott M Lundberg, and Su-In Lee. Improving performance of deep learning models with axiomatic attribution priors and expected gradients.Nature machine intelligence, 3(7):620–631, 2021

2021
[16]

Smoothgrad: removing noise by adding noise.arXiv preprint arXiv:1706.03825, 2017

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise.arXiv preprint arXiv:1706.03825, 2017

Pith/arXiv arXiv 2017
[17]

Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv:1312.6034, 2013

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv:1312.6034, 2013

Pith/arXiv arXiv 2013
[18]

RISE: Randomized input sampling for explanation of black-box models

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: Randomized input sampling for explanation of black-box models. InBMVC, 2018

2018
[19]

Towards better understanding of gradient-based attribution methods for deep neural networks

Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. InInternational Conference on Learning Representations, 2018. URLhttps://openreview.net/forum?id=Sy21R9JAW

2018
[20]

Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations.Information Fusion, 81:14–40, 2022

Leila Arras, Ahmed Osman, and Wojciech Samek. Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations.Information Fusion, 81:14–40, 2022

2022
[21]

Large- scale unsupervised semantic segmentation

Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, and Philip Torr. Large- scale unsupervised semantic segmentation. 2022

2022
[22]

Event labeling combining ensemble detectors and background knowledge

Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2(2):113–127, 2014

2014
[23]

Sparse spatial autoregressions.Statistics & Probability Letters, 33(3): 291–297, 1997

R Kelley Pace and Ronald Barry. Sparse spatial autoregressions.Statistics & Probability Letters, 33(3): 291–297, 1997

1997
[24]

Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

2009
[25]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

2016
[26]

Eraser: A benchmark to evaluate rationalized nlp models

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. Eraser: A benchmark to evaluate rationalized nlp models. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 4443–4458, 2020

2020
[27]

Inouye, and Pradeep Ravikumar

Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Sai Suggala, David I. Inouye, and Pradeep Ravikumar. On the (in)fidelity and sensitivity of explanations. InAdvances in Neural Information Processing Systems (NeurIPS), 2019. 12

2019
[28]

Aumann and L.S

R.J. Aumann and L.S. Shapley.Values of Non-atomic Games. Princeton Legacy Library. Prince- ton University Press, 1974. ISBN 9780691081038. URL https://books.google.com/books?id= SIvUAQAACAAJ

1974
[29]

Spectral integrated gradients for coarse-to- fine feature attribution

Soyeon Kim, Seongwoo Lim, Kyowoon Lee, and Jaesik Choi. Spectral integrated gradients for coarse-to- fine feature attribution. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2026. URLhttp://arxiv.org/abs/2605.19607

Pith/arXiv arXiv 2026
[30]

A rigorous study of integrated gradients method and extensions to internal neuron attributions

Daniel D Lundstrom, Tianjian Huang, and Meisam Razaviyayn. A rigorous study of integrated gradients method and extensions to internal neuron attributions. InInternational Conference on Machine Learning, pages 14485–14508. PMLR, 2022

2022
[31]

Stochastic integrated explanations for vision models

Oren Barkan, Yehonatan Elisha, Jonathan Weill, Yuval Asher, Amit Eshel, and Noam Koenigstein. Stochastic integrated explanations for vision models. In2023 IEEE International Conference on Data Mining (ICDM), pages 938–943. IEEE, 2023

2023
[32]

Visual explanations via iterated integrated attributions

Oren Barkan, Yehonatan Elisha, Yuval Asher, Amit Eshel, and Noam Koenigstein. Visual explanations via iterated integrated attributions. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023
[33]

Denoising diffusion path: Attribution noise reduction with an auxiliary diffusion model.Advances in Neural Information Processing Systems, 37: 54003–54025, 2024

Yiming Lei, Zilong Li, Junping Zhang, and Hongming Shan. Denoising diffusion path: Attribution noise reduction with an auxiliary diffusion model.Advances in Neural Information Processing Systems, 37: 54003–54025, 2024

2024
[34]

Probabilistic path integration with mixture of baseline distributions

Yehonatan Elisha, Oren Barkan, and Noam Koenigstein. Probabilistic path integration with mixture of baseline distributions. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 570–580, 2024

2024
[35]

Restricting the flow: Information bottlenecks for attribution

Karl Schulz, Leon Sixt, Federico Tombari, and Tim Landgraf. Restricting the flow: Information bottlenecks for attribution. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1xWh1rYwB

2020
[36]

Inserting Information Bottlenecks for Attribution in Transformers

Zhiying Jiang, Raphael Tang, Ji Xin, and Jimmy Lin. Inserting Information Bottlenecks for Attribution in Transformers. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 3850–3857, Online, November 2020. Association for Computational Linguistics. URL https://www. aclweb.org/anthology/2020.findings-emnlp.343

2020
[37]

Video-bench: Human-aligned video generation benchmark

Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon, and Seong-Whan Lee. Comprehensive information bottleneck for unveiling universal attribution to interpret vision transformers. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25166–25175, 2025. doi: 10.1109/CVPR52734.2025.02343

work page doi:10.1109/cvpr52734.2025.02343 2025
[38]

Kieran A Murphy and Dani S. Bassett. Interpretability with full complexity by constraining feature information. InInternational Conference on Learning Representations (ICLR), 2023. URL https: //openreview.net/forum?id=R_OL5mLhsv

2023
[39]

Information decomposition in complex systems via machine learning.Proceedings of the National Academy of Sciences, 121(13):e2312988121, 2024

Kieran A Murphy and Dani S Bassett. Information decomposition in complex systems via machine learning.Proceedings of the National Academy of Sciences, 121(13):e2312988121, 2024

2024
[40]

Captum: A unified and generic model interpretability library for pytorch, 2020

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. Captum: A unified and generic model interpretability library for pytorch, 2020. A Implementation Details Code for the synthetic, image, and tabular attribution exper...

arXiv 2020

[1] [1]

Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021

Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J Anders, and Klaus-Robert Müller. Explaining deep neural networks and beyond: A review of methods and applications.Proceedings of the IEEE, 109(3):247–278, 2021

2021

[2] [2]

Christoph Molnar.Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2022

2022

[3] [3]

A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

2017

[4] [4]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017

2017

[5] [5]

Visualizing the impact of feature attribution baselines

Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Visualizing the impact of feature attribution baselines. Distill, 2020. doi: 10.23915/distill.00022. https://distill.pub/2020/attribution-baselines

work page doi:10.23915/distill.00022 2020

[6] [6]

Attribution in scale and space

Shawn Xu, Subhashini Venugopalan, and Mukund Sundararajan. Attribution in scale and space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020

[7] [7]

DER: Dynamically Expandable Representation for Class Incremental Learning

Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, and Tolga Bolukbasi. Guided integrated gradients: an adaptive path method for removing noise. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5048–5056, 2021. doi: 10.1109/CVPR46437.2021.00501

work page doi:10.1109/cvpr46437.2021.00501 2021

[8] [8]

Understanding global feature contributions with additive importance measures.Advances in neural information processing systems, 33:17212–17223, 2020

Ian Covert, Scott M Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures.Advances in neural information processing systems, 33:17212–17223, 2020. 11

2020

[9] [9]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations (ICLR), 2014

2014

[10] [10]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

2009

[11] [11]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015

2015

[12] [12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016

[13] [13]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

2021

[14] [14]

Integrated decision gradients: Compute your attributions where the model makes its decision

Chase Walker, Sumit Jha, Kenny Chen, and Rickard Ewetz. Integrated decision gradients: Compute your attributions where the model makes its decision. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5289–5297, 2024

2024

[15] [15]

Improving performance of deep learning models with axiomatic attribution priors and expected gradients.Nature machine intelligence, 3(7):620–631, 2021

Gabriel Erion, Joseph D Janizek, Pascal Sturmfels, Scott M Lundberg, and Su-In Lee. Improving performance of deep learning models with axiomatic attribution priors and expected gradients.Nature machine intelligence, 3(7):620–631, 2021

2021

[16] [16]

Smoothgrad: removing noise by adding noise.arXiv preprint arXiv:1706.03825, 2017

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise.arXiv preprint arXiv:1706.03825, 2017

Pith/arXiv arXiv 2017

[17] [17]

Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv:1312.6034, 2013

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv:1312.6034, 2013

Pith/arXiv arXiv 2013

[18] [18]

RISE: Randomized input sampling for explanation of black-box models

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: Randomized input sampling for explanation of black-box models. InBMVC, 2018

2018

[19] [19]

Towards better understanding of gradient-based attribution methods for deep neural networks

Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. InInternational Conference on Learning Representations, 2018. URLhttps://openreview.net/forum?id=Sy21R9JAW

2018

[20] [20]

Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations.Information Fusion, 81:14–40, 2022

Leila Arras, Ahmed Osman, and Wojciech Samek. Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations.Information Fusion, 81:14–40, 2022

2022

[21] [21]

Large- scale unsupervised semantic segmentation

Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, and Philip Torr. Large- scale unsupervised semantic segmentation. 2022

2022

[22] [22]

Event labeling combining ensemble detectors and background knowledge

Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2(2):113–127, 2014

2014

[23] [23]

Sparse spatial autoregressions.Statistics & Probability Letters, 33(3): 291–297, 1997

R Kelley Pace and Ronald Barry. Sparse spatial autoregressions.Statistics & Probability Letters, 33(3): 291–297, 1997

1997

[24] [24]

Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

2009

[25] [25]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

2016

[26] [26]

Eraser: A benchmark to evaluate rationalized nlp models

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. Eraser: A benchmark to evaluate rationalized nlp models. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 4443–4458, 2020

2020

[27] [27]

Inouye, and Pradeep Ravikumar

Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Sai Suggala, David I. Inouye, and Pradeep Ravikumar. On the (in)fidelity and sensitivity of explanations. InAdvances in Neural Information Processing Systems (NeurIPS), 2019. 12

2019

[28] [28]

Aumann and L.S

R.J. Aumann and L.S. Shapley.Values of Non-atomic Games. Princeton Legacy Library. Prince- ton University Press, 1974. ISBN 9780691081038. URL https://books.google.com/books?id= SIvUAQAACAAJ

1974

[29] [29]

Spectral integrated gradients for coarse-to- fine feature attribution

Soyeon Kim, Seongwoo Lim, Kyowoon Lee, and Jaesik Choi. Spectral integrated gradients for coarse-to- fine feature attribution. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2026. URLhttp://arxiv.org/abs/2605.19607

Pith/arXiv arXiv 2026

[30] [30]

A rigorous study of integrated gradients method and extensions to internal neuron attributions

Daniel D Lundstrom, Tianjian Huang, and Meisam Razaviyayn. A rigorous study of integrated gradients method and extensions to internal neuron attributions. InInternational Conference on Machine Learning, pages 14485–14508. PMLR, 2022

2022

[31] [31]

Stochastic integrated explanations for vision models

Oren Barkan, Yehonatan Elisha, Jonathan Weill, Yuval Asher, Amit Eshel, and Noam Koenigstein. Stochastic integrated explanations for vision models. In2023 IEEE International Conference on Data Mining (ICDM), pages 938–943. IEEE, 2023

2023

[32] [32]

Visual explanations via iterated integrated attributions

Oren Barkan, Yehonatan Elisha, Yuval Asher, Amit Eshel, and Noam Koenigstein. Visual explanations via iterated integrated attributions. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023

[33] [33]

Denoising diffusion path: Attribution noise reduction with an auxiliary diffusion model.Advances in Neural Information Processing Systems, 37: 54003–54025, 2024

Yiming Lei, Zilong Li, Junping Zhang, and Hongming Shan. Denoising diffusion path: Attribution noise reduction with an auxiliary diffusion model.Advances in Neural Information Processing Systems, 37: 54003–54025, 2024

2024

[34] [34]

Probabilistic path integration with mixture of baseline distributions

Yehonatan Elisha, Oren Barkan, and Noam Koenigstein. Probabilistic path integration with mixture of baseline distributions. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 570–580, 2024

2024

[35] [35]

Restricting the flow: Information bottlenecks for attribution

Karl Schulz, Leon Sixt, Federico Tombari, and Tim Landgraf. Restricting the flow: Information bottlenecks for attribution. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1xWh1rYwB

2020

[36] [36]

Inserting Information Bottlenecks for Attribution in Transformers

Zhiying Jiang, Raphael Tang, Ji Xin, and Jimmy Lin. Inserting Information Bottlenecks for Attribution in Transformers. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 3850–3857, Online, November 2020. Association for Computational Linguistics. URL https://www. aclweb.org/anthology/2020.findings-emnlp.343

2020

[37] [37]

Video-bench: Human-aligned video generation benchmark

Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon, and Seong-Whan Lee. Comprehensive information bottleneck for unveiling universal attribution to interpret vision transformers. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25166–25175, 2025. doi: 10.1109/CVPR52734.2025.02343

work page doi:10.1109/cvpr52734.2025.02343 2025

[38] [38]

Kieran A Murphy and Dani S. Bassett. Interpretability with full complexity by constraining feature information. InInternational Conference on Learning Representations (ICLR), 2023. URL https: //openreview.net/forum?id=R_OL5mLhsv

2023

[39] [39]

Information decomposition in complex systems via machine learning.Proceedings of the National Academy of Sciences, 121(13):e2312988121, 2024

Kieran A Murphy and Dani S Bassett. Information decomposition in complex systems via machine learning.Proceedings of the National Academy of Sciences, 121(13):e2312988121, 2024

2024

[40] [40]

Captum: A unified and generic model interpretability library for pytorch, 2020

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. Captum: A unified and generic model interpretability library for pytorch, 2020. A Implementation Details Code for the synthetic, image, and tabular attribution exper...

arXiv 2020