arxiv: 2605.02167 · v2 · submitted 2026-05-04 · 💻 cs.LG · cs.AI· cs.CV

Recognition: 2 theorem links

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Jaesik Choi, Kyowoon Lee, Seongwoo Lim, Soyeon Kim

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords gradientsattributionintegratedma-gigguidedpathreducesexplanations

0 comments

The pith

By constructing attribution paths in a variational autoencoder's latent space, MA-GIG produces more faithful feature attributions than standard path-based methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Feature attribution methods like Integrated Gradients can give unreliable results when their integration paths stray into unrealistic parts of the input space. The authors introduce Manifold-Aligned Guided Integrated Gradients to fix this by moving the path construction into the latent space of a pre-trained variational autoencoder. Intermediate points are decoded back to input space, which pulls the path toward the learned data manifold. This approach reduces noise from off-manifold regions and leads to explanations that better match how the model actually behaves near the input. A reader would care because accurate explanations help build trust in and debug complex neural network models.

Core claim

MA-GIG constructs attribution paths by interpolating in the latent space of a pre-trained variational autoencoder and decoding intermediate latent states to input space. This biases the path toward the learned generative manifold and reduces exposure to implausible regions with noisy gradients. Aggregating gradients along these manifold-aligned paths produces explanations that are more faithful to the model's predictions on features proximal to the input and outperforms prior path-based methods across datasets and classifiers.

What carries the argument

The manifold-aligned integration path, created by latent-space interpolation in a variational autoencoder followed by decoding to input space, which ensures gradient aggregations occur on plausible data points near the input.

Load-bearing premise

A pre-trained variational autoencoder must accurately capture the data manifold so that decoded latent-space paths yield gradient aggregations more faithful than those from input-space paths.

What would settle it

If quantitative faithfulness metrics on a dataset with high VAE reconstruction error show MA-GIG underperforming standard Guided Integrated Gradients, the claim that manifold alignment improves attributions would be falsified.

Figures

Figures reproduced from arXiv: 2605.02167 by Jaesik Choi, Kyowoon Lee, Seongwoo Lim, Soyeon Kim.

**Figure 1.** Figure 1: Overview of Manifold-Aligned Guided Integrated Gradients (MA-GIG). a) Noise-Robust Gradient Guidance: The visualization compares integration paths from the baseline to the input on the classifier’s logit surface f(x). The MA-GIG path (green solid line) traverses noise-robust regions, avoiding the high-frequency regions traversed by the linear IG path (pink dotted line). This is achieved by a gradient magni… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of attribution maps on ImageNet (InceptionV1), Oxford-IIIT Pet (ResNet18), and Oxford 102 Flower (VGG16) against baselines. Labels indicate predicted classes, and numbers in brackets denote prediction confidence. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 LPIPS GIG (input-space) MA-GIG (latent-space) Path position (0=baseline, 1=input) view at source ↗

**Figure 3.** Figure 3: LPIPS-based manifold alignment path analysis on ImageNet (ResNet18). The LPIPS distance from each intermediate sample γ(α) to the input. reconstruction MSE shows only a moderate correlation with DiffID (average Pearson r = 0.406) and Insertion AUC (r = 0.530), suggesting that domain alignment and taskrelevant latent structure matter beyond pixel-level reconstruction fidelity. Effect of Slerp. We investi… view at source ↗

**Figure 4.** Figure 4: Hyperparameter sensitivity and ablation analysis. (a) The model performance remains consistent across varying feature selection fractions. (b) We compare different pre-trained VAE backbones and the effect of Spherical Linear Interpolation (Slerp). Slerp (dashed lines) shows mixed changes relative to linear interpolation (solid lines), with no consistent DiffID gain. We therefore use linear interpolation as… view at source ↗

**Figure 5.** Figure 5: Qualitative comparison against baselines on ImageNet (InceptionV1). Left labels indicate the predicted class, and numbers in brackets denote confidence. 22 view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on Oxford 102 Flower (VGG16). Labels on the left indicate predicted classes, and numbers in brackets denote prediction confidence. 23 view at source ↗

**Figure 7.** Figure 7: Qualitative comparison on Oxford-IIIT Pet (ResNet18). Labels on the left indicate predicted classes, and numbers in brackets denote prediction confidence. 24 view at source ↗

**Figure 8.** Figure 8: Qualitative comparison on ImageNet2012 (ResNet18 & VGG16). The left and right panels display results for the ResNet18 and VGG16 classifiers, respectively. For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integra… view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on ImageNet2012 (InceptionV1) and Oxford-IIIT Pet (ResNet18). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. … view at source ↗

**Figure 10.** Figure 10: Qualitative comparison on Oxford-IIIT Pet (VGG16 & InceptionV1). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. (Conf.: Confidence) 28 view at source ↗

**Figure 11.** Figure 11: Qualitative comparison on Oxford 102 flower (ResNet18 & InceptionV1). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. (Conf.: Confid… view at source ↗

**Figure 12.** Figure 12: LPIPS distance along the integration path on fine-grained datasets. We measure LPIPS-based perceptual deviation of each intermediate sample γ(α) for (a) Oxford-IIIT Pet and (b) Oxford 102 Flower using ResNet18 view at source ↗

**Figure 13.** Figure 13: Average classifier confidence along the integration path (α from 0 to 1) on three classifiers. We measure the softmax score of each intermediate sample γ(α) for (a) ResNet18, (b) VGG16, and (c) InceptionV1 on ImageNet2012 view at source ↗

read the original abstract

Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose \emph{Manifold-Aligned Guided Integrated Gradients} (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Manifold-Aligned Guided Integrated Gradients (MA-GIG), an extension of Integrated Gradients (IG) that constructs attribution paths by interpolating in the latent space of a pre-trained variational autoencoder (VAE) and decoding the points back to input space. This is intended to bias paths toward the learned data manifold, reducing exposure to off-manifold regions with noisy gradients compared to standard input-space straight-line paths or Guided IG. The authors report qualitative visualizations and quantitative metrics showing improved faithfulness and outperformance over prior path-based attribution methods on multiple datasets and classifiers, with code released at the provided GitHub link.

Significance. If the central claim holds, the method offers a practical way to improve the reliability of axiomatic feature attributions by leveraging generative models for path construction. This addresses a recognized weakness of IG variants in high-dimensional data where straight-line paths often traverse low-density regions. The open-source code is a clear strength for reproducibility. However, the significance is tempered by the dependence on VAE quality, which is not yet shown to be robust across varying manifold approximations.

major comments (2)

[§3] §3 (Method, latent-space path construction): The assertion that decoded latent interpolations yield gradients 'proximal to the input' and reduce off-manifold noise is load-bearing for the faithfulness claim, yet no metric (e.g., reconstruction error, density estimation, or gradient-norm comparison on vs. off manifold) is reported to quantify how faithfully the VAE manifold approximates the true data distribution on the evaluation sets.
[§4] §4 (Experiments, quantitative results): The reported outperformance over baselines is presented as evidence of reduced off-manifold noise, but the evaluation lacks an ablation on VAE fidelity (different architectures, training regimes, or reconstruction PSNR/SSIM on test data). Without this, it remains possible that gains arise from the specific VAE rather than a general manifold-alignment benefit, undermining the generalizability claim.

minor comments (2)

[Abstract / §3] The phrasing 'aggregating gradients on path features proximal to the input' in the abstract and §3 could be made more precise by defining 'proximal' in terms of a distance or density threshold.
[Figures] Figure captions and axis labels in the qualitative results could explicitly state the VAE architecture and latent dimension used, to aid replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment point by point below, agreeing where the concerns are valid and outlining specific revisions to strengthen the presentation of our claims.

read point-by-point responses

Referee: [§3] §3 (Method, latent-space path construction): The assertion that decoded latent interpolations yield gradients 'proximal to the input' and reduce off-manifold noise is load-bearing for the faithfulness claim, yet no metric (e.g., reconstruction error, density estimation, or gradient-norm comparison on vs. off manifold) is reported to quantify how faithfully the VAE manifold approximates the true data distribution on the evaluation sets.

Authors: We agree that direct metrics quantifying VAE manifold fidelity would make the central claim more robust. In the revised manuscript we will report reconstruction errors (MSE) on the test sets for all datasets and include a brief comparison of gradient norms for decoded vs. straight-line path points. These additions will provide explicit evidence that decoded points remain closer to the data distribution. revision: yes
Referee: [§4] §4 (Experiments, quantitative results): The reported outperformance over baselines is presented as evidence of reduced off-manifold noise, but the evaluation lacks an ablation on VAE fidelity (different architectures, training regimes, or reconstruction PSNR/SSIM on test data). Without this, it remains possible that gains arise from the specific VAE rather than a general manifold-alignment benefit, undermining the generalizability claim.

Authors: The referee correctly identifies a gap in demonstrating that the benefit is due to manifold alignment in general rather than the particular VAE chosen. We will add an ablation study in the revision that varies VAE training duration and latent dimension on one dataset, reports the resulting reconstruction quality, and shows the corresponding change in MA-GIG faithfulness scores. This will directly link attribution performance to manifold approximation quality. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic construction with external VAE and empirical validation

full rationale

The paper presents MA-GIG as an algorithmic modification to Integrated Gradients that interpolates in the latent space of a separately pre-trained VAE, decodes points, and aggregates gradients along the resulting path. No equations, derivations, or self-citations are supplied that reduce the claimed reduction in off-manifold noise or the reported performance gains to a quantity defined by the method's own fitted parameters or prior outputs. The central claim rests on the external VAE manifold approximation plus qualitative/quantitative evaluations on multiple datasets, which are independent of any self-referential loop. This is a standard proposal of a new heuristic with external dependencies and benchmarks, not a closed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a pre-trained variational autoencoder faithfully represents the data manifold. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption A pre-trained variational autoencoder accurately models the underlying data manifold.
The method requires that decoded latent trajectories remain proximal to realistic inputs; the abstract does not discuss how the VAE is trained or validated.

pith-pipeline@v0.9.0 · 5500 in / 1340 out tokens · 87056 ms · 2026-05-08T19:04:06.763098+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 14 canonical work pages

[1]

Beyond single path in- tegrated gradients for reliable input attribution via ran- domized path sampling

Jeon, G., Jeong, H., and Choi, J. Beyond single path in- tegrated gradients for reliable input attribution via ran- domized path sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2052– 2061,

2052
[2]

Landing with the score: Riemannian optimization through denoising, 2025

Kharitenko, A., Shen, Z., de Santi, R., He, N., and Doerfler, F. Landing with the score: Riemannian optimization through denoising.arXiv preprint arXiv:2509.23357,

work page arXiv
[3]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114,

work page Pith review arXiv
[4]

and Choi, J

Lee, K. and Choi, J. Local manifold approximation and projection for manifold-aware diffusion planning.arXiv preprint arXiv:2506.00867, 2025a. Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners. InAdvances in Neural Information Processing Systems (NeurIPS), 2025b. Lee, K., Lee, K., Lee, H., and Shin, J. A simple unified framew...

work page arXiv
[5]

Adaptive and explainable deployment of navigation skills via hierarchical deep re- inforcement learning.arXiv preprint arXiv:2305.19746, 2023a

Lee, K., Kim, S., and Choi, J. Adaptive and explainable deployment of navigation skills via hierarchical deep re- inforcement learning.arXiv preprint arXiv:2305.19746, 2023a. Lee, K., Kim, S., and Choi, J. Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans. InAdvances in Neural Information Processing Syst...

work page arXiv
[6]

Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion,

Razzhigaev, A., Shakhmatov, A., Maltseva, A., Arkhip- kin, V ., Pavlov, I., Ryabov, I., Kuts, A., Panchenko, A., Kuznetsov, A., and Dimitrov, D. Kandinsky: an improved text-to-image synthesis with image prior and latent diffu- sion.arXiv preprint arXiv:2310.03502,

work page arXiv
[7]

Not just a black box: Learning important features through propagating activation differences, 2017

Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. Not just a black box: Learning important features through propagating activation differences.arXiv preprint arXiv:1605.01713,

work page arXiv
[8]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Simonyan, K., Vedaldi, A., and Zisserman, A. Deep in- side convolutional networks: Visualising image clas- sification models and saliency maps.arXiv preprint arXiv:1312.6034,

work page Pith review arXiv
[9]

Zhang, B., Zheng, W., Zhou, J., and Lu, J

URL https://proceedings.mlr.press/ v235/zaher24a.html. Zhang, B., Zheng, W., Zhou, J., and Lu, J. Path choice mat- ters for clear attribution in path methods.arXiv preprint arXiv:2401.10442,

work page arXiv
[10]

End-to-end autonomous driving: Challenges and frontiers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):10164–10183, 2024

ISSN 1939-3539. doi: 10.1109/TPAMI.2024. 3388092. 12 Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution Appendix Table of Contents Appendix A.Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Appen...

work page doi:10.1109/tpami.2024 1939
[11]

17 Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution Aggregated Score.The final scalar metric is obtained by integrating the DiffID curve over the perturbation range: DiffID= Z 1 0 ψ(x, δ)dδ.(31) A higher DiffID score indicates a more faithful attribution map, distinguishing salient features from irrelevant ones more effectivel...

2012
[12]

Unlike the SD family, it utilizes mCLIP for embeddings and employs a MoVQGAN decoder for latent reconstruction

based on unCLIP architecture. Unlike the SD family, it utilizes mCLIP for embeddings and employs a MoVQGAN decoder for latent reconstruction. It is trained on large-scale datasets including LAION HighRes and fine-tuned on high-quality internal datasets. We use the kandinsky-community/kandinsky-2-1checkpoint. F.3. Baselines Here we describe the specific im...

work page arXiv 2016
[13]

0.1996 0.2861 0.0866 AGI (Pan et al.,

1996
[14]

Method MPRT (↑) G×I (Shrikumar et al., 2016)−0.005 IG (Sundararajan et al.,

0.2532 0.32120.0680 MA-GIG 0.2670 0.3429 0.0760 Table 7.Efficient MPRT sanity check on Ima- geNet with ResNet18. Method MPRT (↑) G×I (Shrikumar et al., 2016)−0.005 IG (Sundararajan et al.,

work page arXiv 2016
[15]

0.073 IG2 (Zhuo & Ge, 2024)−0.063 AGI (Pan et al., 2021)−0.066 EIG (Jha et al.,

2024
[16]

0.013 MIG (Zaher et al., 2024)−0.003 GIG (Kapishnikov et al.,

2024
[17]

MA-GIG preserves the ranking observed in the main evaluation and achieves the best DiffID and Insertion AUC

0.308 MA-GIG 0.625 Large-scale ImageNet evaluation.To verify that the ImageNet results are not an artifact of the 500-image subset, we additionally evaluate 5,000 ImageNet validation images using ResNet18. MA-GIG preserves the ranking observed in the main evaluation and achieves the best DiffID and Insertion AUC. Model-randomization sanity check.We evalua...

2016
[18]

0.2384 0.4378 0.1994 0.4060 0.5174 0.1114 0.2255 0.3940 0.1685 IG (Sundararajan et al.,

1994
[19]

0.3634 0.5093 0.1459 0.5556 0.58890.03330.3586 0.4880 0.1294 MA-GIG (SD1) 0.4255 0.5550 0.1294 0.5745 0.6547 0.0802 0.4033 0.5147 0.1114 MA-GIG (SD1, w/ Slerp) 0.4387 0.5652 0.1264 0.5985 0.6751 0.0766 0.3937 0.5042 0.1105 MA-GIG (SD2) 0.4495 0.5697 0.1201 0.5901 0.6601 0.0700 0.4075 0.5213 0.1138 MA-GIG (SD2, w/ Slerp) 0.4474 0.5739 0.12640.6171 0.69790....

work page arXiv
[20]

0.1222 0.2338 0.1116 0.2784 0.3576 0.0791 0.2000 0.2813 0.0813 IG (Sundararajan et al.,

2000
[21]

0.0193 0.1713 0.1520 0.2224 0.3304 0.1080 0.0816 0.2053 0.1238 AGI (Pan et al.,

2053
[22]

0.0136 0.1569 0.1433 0.1402 0.2682 0.1280 0.0787 0.1947 0.1160 EIG (Jha et al.,

1947
[23]

0.1891 0.27200.08290.2542 0.3424 0.0882 0.2551 0.3282 0.0731 MA-GIG (SD1) 0.2249 0.3240 0.0991 0.3367 0.4109 0.0742 0.3067 0.3800 0.0733 MA-GIG (SD1, w/ Slerp) 0.2009 0.3064 0.1056 0.3333 0.4131 0.0798 0.2884 0.3638 0.0753 MA-GIG (SD2) 0.2324 0.3253 0.0929 0.3429 0.4191 0.0762 0.3073 0.3756 0.0682 MA-GIG (SD2, w/ Slerp) 0.1933 0.2998 0.1064 0.3404 0.4196 ...

work page arXiv 2009
[24]

0.1358 0.2507 0.1149 0.2469 0.3222 0.0753 0.2020 0.3138 0.1118 IG2 (Zhuo & Ge,

2020
[25]

0.1447 0.2507 0.1060 0.2242 0.3040 0.0798 0.2067 0.2993 0.0927 EIG (Jha et al.,

2067
[26]

0.1327 0.2467 0.1140 0.2436 0.3222 0.0787 0.1942 0.3013 0.1071 MIG (Zaher et al.,

1942
[27]

0.2536 0.3304 0.0769 0.2842 0.33670.05240.2771 0.3667 0.0896 MA-GIG (SD1) 0.2467 0.3333 0.0867 0.3247 0.3869 0.0622 0.3020 0.3871 0.0851 MA-GIG (SD1, w/ Slerp) 0.2542 0.3382 0.0840 0.3329 0.3933 0.0604 0.3118 0.3893 0.0776 MA-GIG (SD2) 0.2598 0.3409 0.0811 0.3284 0.3900 0.0616 0.3082 0.3838 0.0756 MA-GIG (SD2, w/ Slerp) 0.26380.34780.08400.3429 0.39820.05...

work page arXiv 2017
[28]

The visualized intermediate steps (second and third rows in each panel) provide empirical evidence for the reliability of our framework

These visualizations cover various classifiers on the ImageNet2012, Oxford-IIIT Pet, and Oxford 102 Flower datasets. The visualized intermediate steps (second and third rows in each panel) provide empirical evidence for the reliability of our framework. First, regarding gradient behavior, pixel-space guidance methods like GIG often suffer from manifold de...

2019