International conference on machine learning , pages=

Axiomatic attribution for deep networks , author= · 2017

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

representative citing papers

$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

stat.ML · 2026-05-15 · unverdicted · novelty 7.0

α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

SoftBlobGIN combines ESM-2 representations with protein contact graphs via a lightweight GNN and differentiable substructure pooling to achieve 92.8% accuracy on enzyme classification, raise binding-site AUROC to 0.983, and generate auditable structural explanations without retraining the language模型

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

cs.CL · 2026-05-05 · conditional · novelty 7.0

LLMs show significant biases in conflict event classification, with open-weight models exhibiting false illegitimation and adapted models showing actor bias and lexical sensitivity, making them unsuitable for unsupervised deployment.

GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many backbones and datasets.

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

Learning Quantifiable Visual Explanations Without Ground-Truth

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

A perturbation-based metric for XAI quality that formalizes sufficiency and necessity, paired with an adapter trained via differentiable supervision to generate causal explanations on black-box models.

TabSHAP

cs.LG · 2026-04-22 · unverdicted · novelty 4.0

TabSHAP attributes feature impact in LLM tabular classifiers via sampled Shapley coalitions and JSD on output distributions, reporting higher deletion faithfulness than random or XGBoost-proxy baselines on Adult Income and Heart Disease data.

citing papers explorer

Showing 9 of 9 citing papers.

$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors stat.ML · 2026-05-15 · unverdicted · none · ref 138
α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender cs.CL · 2026-05-12 · unverdicted · none · ref 54
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning cs.LG · 2026-05-09 · unverdicted · none · ref 10
SoftBlobGIN combines ESM-2 representations with protein contact graphs via a lightweight GNN and differentiable substructure pooling to achieve 92.8% accuracy on enzyme classification, raise binding-site AUROC to 0.983, and generate auditable structural explanations without retraining the language模型
Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa cs.CL · 2026-05-05 · conditional · none · ref 74
LLMs show significant biases in conflict event classification, with open-weight models exhibiting false illegitimation and adapted models showing actor bias and lexical sensitivity, making them unsuitable for unsupervised deployment.
GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging cs.CV · 2026-05-17 · unverdicted · none · ref 65
GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many backbones and datasets.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 40
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Interpretability Can Be Actionable cs.LG · 2026-05-11 · conditional · none · ref 72
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Learning Quantifiable Visual Explanations Without Ground-Truth cs.AI · 2026-05-18 · unverdicted · none · ref 37
A perturbation-based metric for XAI quality that formalizes sufficiency and necessity, paired with an adapter trained via differentiable supervision to generate causal explanations on black-box models.
TabSHAP cs.LG · 2026-04-22 · unverdicted · none · ref 4
TabSHAP attributes feature impact in LLM tabular classifiers via sampled Shapley coalitions and JSD on output distributions, reporting higher deletion faithfulness than random or XGBoost-proxy baselines on Adult Income and Heart Disease data.

International conference on machine learning , pages=

fields

years

verdicts

representative citing papers

citing papers explorer