The khipu problem frames a governance failure in distributed AI where interpretive continuity is lost even when traces remain, requiring infrastructure to preserve reading practices rather than only data retention.
hub Canonical reference
Towards A Rigorous Science of Interpretable Machine Learning
Canonical reference. 71% of citing Pith papers cite this work as background.
abstract
As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
An argument paper reframes LLM explainability as an embodied, situated practice based on Dourish and enactivist cognition, identifying ontological obstacles in internal explanations and advocating affordance-based designs.
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.
A training-free method using Fourier-parameterized star-convex contours optimized via gradients to generate compact, faithful visual attributions for image classifiers on benchmarks like ImageNet.
A method automatically constructs a causal model from behavior tree structure and domain knowledge to generate real-time causal counterfactual explanations for robot decisions.
SAE-NOs extend sparse autoencoders to function spaces via Fourier neural operators with concept and domain sparsity, learning localized patterns more efficiently and generalizing across discretizations on vision data.
MIMIC is a new inversion framework that recovers visual concepts from VLM internal states using joint inversion, feature alignment, and three regularizers.
Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.
Matryoshka Sparse Autoencoders applied to matrix-factorization embeddings recover hierarchical, metadata-aligned features that permit targeted intervention on gender-associated neurons.
Introduces a constraint-satisfaction algorithm and complexity results for recovering linear utilities and latent group bonuses to explain observed rankings under hidden sensitive features.
I-SAFE is a post-hoc auditing framework that applies quantile-based and Wasserstein coherence metrics to evaluate distributional response of DTI prediction models under structural perturbations from external priors like KLIFS annotations.
AI models misalign with humans on concept boundaries when probed with implausible category members, such as classifying words as vehicles or vegetables as fruit.
p-ResNet-50 adds a prototype layer with anchor- and medoid-based regularizations to ResNet-50, achieving ROC-AUC 0.994 and accuracy 0.957 on ~12k XCT patches while supplying case-based explanations aligned to expert categories.
The authors introduce a taxonomy with target, functional role, and mode of justification axes plus a framework that decomposes abstract XAI desiderata into concrete benchmarkable tasks via identified dependency structures.
CLIF applies influence functions to pinpoint influential samples and concepts in CBMs on CEBaB and Yelp datasets, enabling performance restoration via adjustments without retraining.
An entropy criterion on mean representations characterises the polarised regime in VAEs and related models, with theoretical links to KL minimisation and empirical tests across several architectures.
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
ShifaMind achieves competitive performance with the LAAT baseline on MIMIC-IV top-50 ICD-10 coding while outperforming vanilla concept bottleneck models and providing concept-mediated explanations.
The authors introduce the XAI Evaluation Card template to standardize how XAI evaluation metrics are defined, validated, and reported.
citing papers explorer
-
Agentic-imodels: Evolving agentic interpretability tools via autoresearch
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
-
Investigating Concept Alignment Using Implausible Category Members
AI models misalign with humans on concept boundaries when probed with implausible category members, such as classifying words as vehicles or vegetables as fruit.
-
The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
-
X-SYS: A Reference Architecture for Interactive Explanation Systems
X-SYS is a reference architecture for interactive explanation systems organized around STAR quality attributes and five service components, demonstrated via SemanticLens for vision-language models.
-
A Unified Framework for Evaluating and Enhancing the Transparency of Explainable AI Methods via Perturbation-Gradient Consensus Attribution
Introduces a unified evaluation framework for XAI using five principled metrics and the PGCA method that fuses grid perturbation with Grad-CAM++ , reporting top scores in fidelity, interpretability and fairness on ResNet-50 models across five image domains.
-
AI Native Games: A Survey and Roadmap
The paper proposes a counterfactual definition of AI-native games, screens 53 examples, introduces a G/N taxonomy, and outlines a research roadmap for the field.
-
Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems
MRC computes coalition Shapley credits from performance histories to weight three LLM agents, stabilized by Bayesian mixture and regime multipliers, achieving SR 1.51 and 440.1% cumulative return over 1037 days on 13 crypto assets.
-
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability
NEURON integrates SNOMED CT, ML, and RAG LLM to raise AUC from 0.74-0.77 to 0.84-0.88 and human-aligned explainability scores from 0.50 to 0.85 on MIMIC-IV acute heart failure data.
-
CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations
Cognitive models of user reasoning strategies with XAI methods on tabular data fit human forward-simulation decisions better than ML baselines and support hypothesis testing without new user studies.
-
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making
This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
-
Governed Reasoning for Institutional AI
Cognitive Core uses nine typed cognitive primitives, a four-tier governance model with human review as an execution condition, and an endogenous audit ledger to reach 91% accuracy with zero silent errors on prior authorization appeals, outperforming ReAct and Plan-and-Solve baselines.
-
Beyond Post-hoc Explanation: Toward Glassbox AI via Probabilistic Mediation
The paper proposes the Glassbox Framework in which Bayesian networks serve as transparent ante-hoc mediation layers for generative models to enable auditable reasoning traces and contestable outputs.
-
Artificial Adaptive Intelligence: The Missing Stage Between Narrow and General Intelligence
Proposes Artificial Adaptive Intelligence as the regime between narrow and general AI, defined by elimination of human-specified hyperparameters, and introduces an adaptivity index plus parametric minimality principle grounded in minimum description length.
-
LLMs Should Not Yet Be Credited with Decision Explanation
LLMs support decision prediction and rationale generation but lack evidence for genuine decision explanation, requiring stricter standards to avoid over-crediting.
-
DenoGrad: A Gradient-Based Framework for Data Refinement in Tabular and Time-Series Learning
DenoGrad refines noisy tabular and time-series data by optimizing inputs via gradients from a fixed model, yielding better downstream predictions on ten real-world datasets while preserving data statistics.
-
Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care
A systematic analysis of perceptual speech features finds stable associations with symptom severity in depression, anxiety, and ADHD across multiple datasets using XGBoost with SHAP and LIME.
-
On the Semantic Interpretability of Artificial Intelligence Models
This survey classifies semantic interpretability methods in AI models by nature and feature introduction, reviews user impact, and identifies remaining gaps.