pith. sign in

hub Canonical reference

Towards A Rigorous Science of Interpretable Machine Learning

Canonical reference. 71% of citing Pith papers cite this work as background.

74 Pith papers citing it
Background 71% of classified citations
abstract

As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

hub tools

citation-role summary

background 13 method 1

citation-polarity summary

clear filters

representative citing papers

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

Evaluation Cards for XAI Metrics

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

The authors introduce the XAI Evaluation Card template to standardize how XAI evaluation metrics are defined, validated, and reported.

NEURON: A Neuro-symbolic System for Grounded Clinical Explainability

cs.AI · 2026-05-02 · unverdicted · novelty 6.0

NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via RAG-LLM.

Faster Verified Explanations for Neural Networks

cs.LG · 2025-11-28 · unverdicted · novelty 6.0

FaVeX accelerates verified explanations for neural networks via dynamic batch-sequential processing and query reuse while introducing verifier-optimal robust explanations that incorporate verifier incompleteness.

citing papers explorer

Showing 11 of 11 citing papers after filters.

  • Interpretable and Steerable Sequence Learning via Prototypes cs.LG · 2019-07-23 · unverdicted · none · ref 11 · internal anchor

    ProSeNet learns a sparse set of prototypes for case-based explanations in deep sequence models, matches state-of-the-art accuracy on several tasks, and supports manual prototype refinement by non-experts.

  • The Price of Interpretability cs.LG · 2019-07-08 · unverdicted · none · ref 14 · internal anchor

    Introduces a framework for constructing ML models via interpretable steps, generalizes standard proxies into a parametrized family of measures, and quantifies the accuracy-interpretability tradeoff via practical algorithms.

  • Detection of Real-world Driving-induced Affective State Using Physiological Signals and Multi-view Multi-task Machine Learning cs.LG · 2019-07-19 · unverdicted · none · ref 12 · internal anchor

    A multi-view multi-task ML method detects real-world driving-induced affective states using physiological signals by modeling inter-drive variability, with results showing performance gains on three datasets.

  • Optimal Explanations of Linear Models cs.LG · 2019-07-08 · unverdicted · none · ref 28 · internal anchor

    An optimization framework decomposes linear models into increasing-complexity sequences using coordinate updates to generate parametrized interpretability metrics.

  • A Human-Grounded Evaluation of SHAP for Alert Processing cs.LG · 2019-07-07 · unverdicted · none · ref 1 · internal anchor

    Human-grounded evaluation finds no significant performance improvement from adding SHAP explanations to model confidence scores in alert processing.

  • Do Transformer Attention Heads Provide Transparency in Abstractive Summarization? cs.CL · 2019-07-01 · unverdicted · none · ref 5 · internal anchor

    Analysis of transformer attention heads in abstractive summarization shows specialization in some heads and proposes a method to measure model reliance on learned attention distributions.

  • Interpretable Question Answering on Knowledge Bases and Text cs.CL · 2019-06-26 · unverdicted · none · ref 7 · internal anchor

    Compares LIME, input perturbation and attention for explaining QA on KB+text; proposes automatic evaluation paradigm and finds input perturbation superior in both automatic and human studies.

  • The Role of Cooperation in Responsible AI Development cs.CY · 2019-07-10 · unverdicted · none · ref 11 · internal anchor

    Competitive pressures in AI development create collective action problems that may require industry cooperation, with key factors and strategies identified to enable responsible outcomes.

  • The Mass, Fake News, and Cognition Security cs.CY · 2019-07-09 · unverdicted · none · ref 150 · internal anchor

    The paper defines Cognition Security (CogSec) as a multidisciplinary field studying cognitive impacts of fake news and outlines research challenges, techniques, and future directions.

  • Unexplainability and Incomprehensibility of Artificial Intelligence cs.CY · 2019-06-20 · unverdicted · none · ref 38 · internal anchor

    Advanced AI systems are unexplainable in full and produce explanations that humans cannot comprehend.

  • On the Semantic Interpretability of Artificial Intelligence Models cs.AI · 2019-07-09 · unverdicted · none · ref 6 · internal anchor

    This survey classifies semantic interpretability methods in AI models by nature and feature introduction, reviews user impact, and identifies remaining gaps.