The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.
hub
The mythos of model interpretability
16 Pith papers cite this work. Polarity classification is still indexing.
abstract
Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not.
hub tools
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
A piecewise biconvex optimization framework unifies sparse dictionary learning variants, explains their pathologies via spurious optima, and enables feature anchoring to restore identifiability.
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Interpretability in SciML requires mechanistic understanding rather than sparsity, and prior knowledge is often essential for interpretable scientific discovery.
CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.
Introduces a framework for constructing ML models via interpretable steps, generalizes standard proxies into a parametrized family of measures, and quantifies the accuracy-interpretability tradeoff via practical algorithms.
Data-driven MoCap-to-radar models often fail to learn underlying physics despite low reconstruction error, with temporal attention proving critical for transformers to achieve physical consistency.
Proposes the CSI framework for co-designing visual interactions and deep learning models to expose and allow semantic control over intermediate reasoning processes, shown in a summarization case study.
An optimization framework decomposes linear models into increasing-complexity sequences using coordinate updates to generate parametrized interpretability metrics.
Human-grounded evaluation finds no significant performance improvement from adding SHAP explanations to model confidence scores in alert processing.
A cycle-consistent GAN generates counterfactual medical images to attribute classification decisions more comprehensively than standard saliency methods.
Model-agnostic SHAP-based pipeline for contrastive explanations and counterfactual datapoints, evaluated on IRIS, Wine Quality, and Mobile Features datasets.
Advanced AI systems are unexplainable in full and produce explanations that humans cannot comprehend.
This survey classifies semantic interpretability methods in AI models by nature and feature introduction, reviews user impact, and identifies remaining gaps.
citing papers explorer
-
Rashomon Sets and Model Multiplicity in Federated Learning
The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.
-
A Unified Theory of Sparse Dictionary Learning in Mechanistic Interpretability: Piecewise Biconvexity and Spurious Minima
A piecewise biconvex optimization framework unifies sparse dictionary learning variants, explains their pathologies via spurious optima, and enables feature anchoring to restore identifiability.
-
Agentic-imodels: Evolving agentic interpretability tools via autoresearch
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
-
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
-
On the definition and importance of interpretability in scientific machine learning
Interpretability in SciML requires mechanistic understanding rather than sparsity, and prior knowledge is often essential for interpretable scientific discovery.
-
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation
CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.
-
The Price of Interpretability
Introduces a framework for constructing ML models via interpretable steps, generalizes standard proxies into a parametrized family of measures, and quantifies the accuracy-interpretability tradeoff via practical algorithms.
-
What Physics do Data-Driven MoCap-to-Radar Models Learn?
Data-driven MoCap-to-radar models often fail to learn underlying physics despite low reconstruction error, with temporal attention proving critical for transformers to achieve physical consistency.
-
Visual Interaction with Deep Learning Models through Collaborative Semantic Inference
Proposes the CSI framework for co-designing visual interactions and deep learning models to expose and allow semantic control over intermediate reasoning processes, shown in a summarization case study.
-
Optimal Explanations of Linear Models
An optimization framework decomposes linear models into increasing-complexity sequences using coordinate updates to generate parametrized interpretability metrics.
-
A Human-Grounded Evaluation of SHAP for Alert Processing
Human-grounded evaluation finds no significant performance improvement from adding SHAP explanations to model confidence scores in alert processing.
-
Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution
A cycle-consistent GAN generates counterfactual medical images to attribute classification decisions more comprehensively than standard saliency methods.
-
Generating Counterfactual and Contrastive Explanations using SHAP
Model-agnostic SHAP-based pipeline for contrastive explanations and counterfactual datapoints, evaluated on IRIS, Wine Quality, and Mobile Features datasets.
-
Unexplainability and Incomprehensibility of Artificial Intelligence
Advanced AI systems are unexplainable in full and produce explanations that humans cannot comprehend.
-
On the Semantic Interpretability of Artificial Intelligence Models
This survey classifies semantic interpretability methods in AI models by nature and feature introduction, reviews user impact, and identifies remaining gaps.
- Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions