hub Canonical reference

A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee · 2017 · cs.AI · arXiv 1705.07874

Canonical reference. 100% of citing Pith papers cite this work as background.

52 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 52 citing papers arXiv PDF

abstract

Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Optimal Recourse Summaries via Bi-Objective Decision Tree Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

SOGAR learns Pareto-optimal recourse summaries by solving a bi-objective decision tree optimization that partitions populations and assigns shared low-cost actions per subgroup.

GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals

cs.LG · 2025-04-14 · unverdicted · novelty 7.0

GlyTwin generates patient-centric counterfactual behavioral interventions to reduce hyperglycemia in type 1 diabetes, evaluated on a new dataset from 50 patients showing 85.8% valid explanations and 87.3% effectiveness.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

cs.CL · 2025-02-17 · unverdicted · novelty 7.0

ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.

Explainable Outlier Detection for Interval-valued Data

stat.ME · 2026-06-24 · unverdicted · novelty 6.0

Derives a closed-form Shapley value for the squared robust Interval-Mahalanobis distance to explain variable contributions to outlyingness in interval-valued data.

KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction

cs.LG · 2026-06-24 · unverdicted · novelty 6.0

KG-TRACE fuses genomic features with RotatE KG embeddings via an epistemic trust gate for AMR prediction, reporting 0.976 AUROC on isoniazid resistance in the CRyPTIC cohort plus 92.5% symbolic coverage via a new Biological Grounding Ratio metric.

TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

TSM-Bench shows SOTA MGT detectors drop 10-40% in accuracy on task-specific Wikipedia edits versus generic text, with fine-tuning on task-specific data generalizing better than the reverse.

Machine-learning-accelerated discovery of synthesizable high-temperature altermagnets with giant spin splitting

cond-mat.mtrl-sci · 2026-05-27 · unverdicted · novelty 6.0

ML-accelerated screening of 8640 AB2C2D variants yields 34 low-hull-energy altermagnets with spin splittings exceeding 1.5 eV, including RbMn2Te2O with 1.88 eV splitting and ~390 K Neel temperature.

Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

cs.CL · 2026-02-18 · unverdicted · novelty 6.0

CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.

Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

cs.SD · 2026-02-14 · unverdicted · novelty 6.0

A framework using covariance-based spectral signatures and TreeSHAP attributions on AASIST3 branches identifies four operational archetypes and a flawed specialization mode that explains high error rates on specific spoofing attacks.

Photometric Redshift PDFs via Neural Network Classification for DESI Legacy Imaging Surveys and Pan-STARRS

astro-ph.GA · 2026-02-02 · unverdicted · novelty 6.0

Neural network classification with CRPS optimization produces calibrated photometric redshift PDFs for DESI Legacy and Pan-STARRS data, achieving σ_NMAD of 0.0153 on LSDR10 and outperforming regression methods.

Filtering Interlopers with Photometry and Diagnostic Features: A Machine Learning Framework Validated with CSST Slitless Spectroscopy

astro-ph.CO · 2026-01-07 · conditional · novelty 6.0

XGBoost classifier filters interlopers in CSST slitless spectroscopy simulations, retaining 42% of galaxies with 96.6% accurate redshifts and 0.13% outliers.

AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption

cs.CL · 2025-08-05 · unverdicted · novelty 6.0

AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.

Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

cs.LG · 2025-06-15 · unverdicted · novelty 6.0

New unsupervised method adapts the multivariate logrank statistic into a differentiable loss for training any neural network on any data modality to discover prognostically distinct patient clusters, demonstrated on myeloma lab data and lung cancer CT images with post-hoc explainability.

Splits! Flexible Sociocultural Linguistic Investigation at Scale

cs.CL · 2025-04-06 · unverdicted · novelty 6.0

A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for promising candidates.

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

q-fin.CP · 2021-08-01 · unverdicted · novelty 6.0

News embeddings from financial text improve out-of-sample realized volatility forecasts for stocks, with stronger effects for stock-specific news and high-volatility periods, and yield gains when combined with benchmarks.

Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

cs.SE · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

Reinforcement learning on MIR features combined with cargo-fuzz validation reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59.0% and accuracy to 65.2%.

Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework

cs.AI · 2026-05-02 · unverdicted · novelty 6.0

The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for continuous production evaluation with an open-source implementation.

Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.

Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution

eess.IV · 2026-04-29 · unverdicted · novelty 6.0

Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.

TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection

cs.AI · 2026-04-12 · unverdicted · novelty 6.0

TrajOnco uses a chain-of-agents LLM architecture with memory to perform temporal reasoning on longitudinal EHR, achieving 0.64-0.80 AUROC for 1-year multi-cancer risk prediction in zero-shot mode on matched cohorts while matching supervised ML on lung cancer and outperforming single-agent baselines.

Transferable 3D Convolutional Neural Networks for Elastic Constants Prediction in Nanoporous Metals

cond-mat.mtrl-sci · 2026-05-20 · conditional · novelty 5.0

3D CNNs predict elastic moduli of nanoporous metals with R²=0.955, outperforming descriptor-based models, and transfer learning works on smaller denser datasets for large-scale Pareto optimization.

Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars

astro-ph.EP · 2026-05-18 · unverdicted · novelty 5.0

ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galactic chemical evolution trends.

Informationally Compressive Anonymization: Non-Degrading Sensitive Input Protection for Privacy-Preserving Supervised Machine Learning

cs.LG · 2026-03-16 · unverdicted · novelty 5.0

ICA and VEIL enable privacy-preserving supervised ML by producing structurally non-invertible encodings aligned with downstream tasks while maintaining predictive utility.

A Performance Analyzer for a Public Cloud's ML-Augmented VM Allocator

cs.DC · 2025-12-08 · unverdicted · novelty 5.0

SANJESH applies bi-level optimization to production traces and reveals VM allocation scenarios that cause 4x worse performance than the operator's existing evaluator detected.

citing papers explorer

Showing 6 of 6 citing papers after filters.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability cs.CL · 2025-02-17 · unverdicted · none · ref 24 · internal anchor
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices cs.CL · 2026-05-29 · unverdicted · none · ref 5 · internal anchor
TSM-Bench shows SOTA MGT detectors drop 10-40% in accuracy on task-specific Wikipedia edits versus generic text, with fine-tuning on task-specific data generalizing better than the reverse.
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models cs.CL · 2026-02-18 · unverdicted · none · ref 24 · internal anchor
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption cs.CL · 2025-08-05 · unverdicted · none · ref 35 · internal anchor
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
Splits! Flexible Sociocultural Linguistic Investigation at Scale cs.CL · 2025-04-06 · unverdicted · none · ref 6 · internal anchor
A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for promising candidates.
Surrogate modeling for interpreting black-box LLMs in medical predictions cs.CL · 2026-04-22 · unverdicted · none · ref 27
A surrogate modeling method approximates LLM-encoded medical knowledge via prompting to quantify variable influence and flag inaccuracies and racial biases.

A Unified Approach to Interpreting Model Predictions

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer