arxiv: 1702.08608 · v2 · submitted 2017-02-28 · 📊 stat.ML · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Towards A Rigorous Science of Interpretable Machine Learning

Been Kim, Finale Doshi-Velez

Pith reviewed 2026-05-12 12:00 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords interpretabilitymachine learningevaluation taxonomyexplanationssafetyfairnessposition paperopen questions

0 comments

The pith

Interpretability in machine learning lacks a shared definition and way to measure it, so this position paper supplies both along with guidance on when explanations are actually required.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Interest in interpretable machine learning has grown because explanations can help check whether systems are safe or free from bias. Without agreement on what interpretability means or how to test it, however, claims about these qualities stay hard to verify across studies. The paper therefore defines interpretability, notes the contexts where it is necessary and where it can be omitted, and introduces a taxonomy that organizes evaluation methods by the kind of explanation offered and the goal being assessed. It closes by listing open questions whose answers would let the field move from ad-hoc arguments to repeatable measurements.

Core claim

The authors state that despite rising use of explanations to assess safety and non-discrimination, interpretable machine learning has no consensus definition and no agreed evaluation standards. They define interpretability as the capacity to explain or present in understandable terms to a human. They specify when such explanations are needed and when they are not. They then offer a taxonomy that structures evaluation approaches and they enumerate open questions that must be resolved to place the area on a more scientific footing.

What carries the argument

The taxonomy for rigorous evaluation, which groups methods according to the form of explanation provided and the property being verified such as safety or fairness.

If this is right

Different interpretable models can be compared using the same set of evaluation criteria.
Studies that test explanations will produce results that can be reproduced by other groups.
Developers will know more precisely when to add explanations and when simpler black-box models suffice.
Research effort will shift toward answering the listed open questions with measurable experiments.
Assessments of safety or non-discrimination will rest on evaluation procedures that can be scrutinized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could be turned into reporting guidelines that conferences require for papers claiming interpretability benefits.
Empirical tests could apply the taxonomy to several existing explanation methods on the same dataset to check whether quality rankings remain stable.
Links to human-subject studies would help determine which explanation formats are understandable to domain experts versus lay users.
The same structure might later be used to evaluate interpretability claims in sequential decision systems such as reinforcement learning agents.

Load-bearing premise

That a common definition together with this taxonomy will produce enough agreement for evaluations of interpretability to become consistent and reproducible.

What would settle it

A later survey of published work that continues to employ incompatible definitions of interpretability and never references the proposed taxonomy would indicate that the offered structure has not created the expected consensus.

read the original abstract

As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This position paper defines interpretability and proposes a high-level taxonomy for its evaluation but offers no new methods, data, or formal results.

read the letter

The core contribution here is a working definition of interpretability, a note on when it is actually required, and a taxonomy that splits evaluation into application-grounded, human-grounded, and functionally-grounded approaches, plus a list of open questions. That structure is the main thing the paper adds. It does a reasonable job of laying out why ad-hoc explanations are insufficient and why different evaluation styles fit different goals, which can help people avoid mismatched tests. The taxonomy itself is straightforward and draws on existing ideas without overclaiming novelty. The open questions section is honest about gaps in measuring faithfulness or human understanding. The soft spots are that the whole piece stays definitional. There are no experiments, no derivations, and no validation that the proposed categories actually lead to better or more reproducible assessments. The taxonomy is presented as a suggestion rather than something derived from data or shown to resolve inconsistencies in prior work. Because it is a position paper, these limitations are expected, but they mean the claim of moving toward a rigorous science rests on future work that the authors do not supply. This is useful reading for anyone starting out in interpretability research or trying to design an evaluation protocol for a high-stakes application. It is not the place to look for a concrete algorithm or a tested metric. I would bring it to a reading group to discuss the taxonomy categories. It is worth citing when framing an evaluation section. It deserves peer review because the framing is clear enough to generate useful referee comments on how to make the taxonomy more operational.

Referee Report

0 major / 3 minor

Summary. This position paper argues that despite surging interest in interpretable machine learning, there remains little consensus on its definition or measurement. The authors first supply a working definition of interpretability and delineate the conditions under which it is required (versus when it is not). They then outline a high-level taxonomy of evaluation approaches and enumerate open questions intended to move the field toward more rigorous, scientific practices.

Significance. If the supplied definition and taxonomy are taken up by the community, the work could function as a useful organizing framework that reduces ad-hoc explanation practices and focuses attention on measurable evaluation criteria. Its primary contribution is conceptual and taxonomic rather than empirical; the explicit listing of open questions is a constructive acknowledgment that the proposed structure is a starting point rather than a finished protocol. The paper therefore earns credit for clarity of framing and for avoiding over-claim.

minor comments (3)

[Section 4] Section 4 (Taxonomy): the four evaluation categories are introduced at a high level of abstraction; adding one or two concrete examples or references for each category would make the taxonomy more immediately usable for readers seeking to apply it.
[Abstract] Abstract: the abstract states that a taxonomy is suggested but does not name or briefly characterize its main branches, which reduces the ability of a quick reader to grasp the paper's central organizational contribution.
[Section 5] Section 5 (Open Questions): several questions are listed as standalone items; grouping them thematically (e.g., under 'evaluation metrics,' 'human-subject protocols,' 'application domains') would improve readability and signal priorities.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our position paper. The summary accurately reflects our goals: to supply a working definition of interpretability, clarify when it is (and is not) required, and propose a taxonomy for its evaluation while highlighting open research questions. We appreciate the recognition that the contribution is primarily conceptual and taxonomic. Given the recommendation for minor revision and the absence of specific major comments, we will incorporate minor editorial improvements for clarity and flow in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This position paper supplies a working definition of interpretability, conditions for its use, a high-level taxonomy of evaluation methods, and a list of open questions. It contains no mathematical derivations, fitted parameters, predictions, or equations that could reduce to prior quantities by construction. The central contribution is explicitly framed as definitional and taxonomic rather than as a completed derivation or empirical result; the text acknowledges remaining gaps and does not invoke self-citations or uniqueness theorems as load-bearing premises. Consequently the argument is self-contained against external benchmarks and exhibits none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a conceptual position paper that introduces definitions and a taxonomy without new empirical data or mathematical derivations.

axioms (1)

domain assumption Explanations from ML systems can be used to qualitatively assess criteria such as safety or non-discrimination
Abstract states that explanations are often used for these purposes.

pith-pipeline@v0.9.0 · 5393 in / 1224 out tokens · 84512 ms · 2026-05-12T12:00:54.938128+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced unclear
there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation

Forward citations

Cited by 28 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agentic-imodels: Evolving agentic interpretability tools via autoresearch
cs.AI 2026-05 unverdicted novelty 7.0

Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction
cs.LG 2026-05 unverdicted novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
cs.LG 2026-05 unverdicted novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Interpretability Can Be Actionable
cs.LG 2026-05 conditional novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Evaluating the False Trust engendered by LLM Explanations
cs.HC 2026-05 unverdicted novelty 6.0

A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
cs.AI 2026-05 unverdicted novelty 6.0

AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
ShifaMind: A Multiplicative Concept Bottleneck for Interpretable ICD-10 Coding
cs.LG 2026-05 unverdicted novelty 6.0

ShifaMind achieves competitive performance with the LAAT baseline on MIMIC-IV top-50 ICD-10 coding while outperforming vanilla concept bottleneck models and providing concept-mediated explanations.
Evaluation Cards for XAI Metrics
cs.CV 2026-05 unverdicted novelty 6.0

The authors introduce the XAI Evaluation Card template to standardize how XAI evaluation metrics are defined, validated, and reported.
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability
cs.AI 2026-05 unverdicted novelty 6.0

NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via...
Towards interpretable AI with quantum annealing feature selection
cs.LG 2026-04 unverdicted novelty 6.0

Quantum annealing solves a combinatorial optimization problem to select key CNN feature maps, yielding more class-disentangled explanations than GradCAM or GradCAM++.
Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings
cs.LG 2026-04 unverdicted novelty 6.0

In high-stakes settings, Shapley explanations increase analyst confidence but do not improve decision accuracy, and standard metrics fail to predict human utility.
From Attribution to Action: A Human-Centered Application of Activation Steering
cs.AI 2026-04 unverdicted novelty 6.0

Activation steering paired with attribution enables intervention-based debugging in vision models, as all 8 interviewed experts shifted to hypothesis testing, most trusted observed responses, and highlighted risks lik...
Design Guidelines for Game-Based Refresher Training of Community Health Workers in Low-Resource Contexts
cs.HC 2026-04 unverdicted novelty 6.0

A four-year mixed-methods study of game-based systems for Indian CHWs yields eight design guidelines for sustained engagement, learning transfer, and contextual appropriateness in low-resource health training.
Ethical and social risks of harm from Language Models
cs.CL 2021-12 accept novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job...
SINAPSE: A lightweight deep learning framework for accurate and explainable neutron-$\gamma$ discrimination
physics.ins-det 2026-05 unverdicted novelty 5.0

SINAPSE uses a dual-branch neural network with a 1D convolutional autoencoder for denoising and a classifier for neutron-gamma discrimination, trained via random augmentations on high-SNR data and validated with SHAP ...
NeuroViz: Real-time Interactive Visualization of Forward and Backward Passes in Neural Network Training
cs.LG 2026-05 unverdicted novelty 5.0

NeuroViz offers interactive real-time visualization of neural network forward and backward passes, achieving top usability scores in a study with 31 participants compared to existing tools.
CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations
cs.AI 2026-04 unverdicted novelty 5.0

Cognitive models of user reasoning strategies with XAI methods on tabular data fit human forward-simulation decisions better than ML baselines and support hypothesis testing without new user studies.
X-NegoBox: An Explainable Privacy-Budget Negotiation Framework for Secure Peer-to-Peer Energy Data Exchange
cs.CR 2026-04 unverdicted novelty 5.0

X-NegoBox is a proposed explainable framework that negotiates privacy budgets for energy data exchange using trust, sensitivity, and purpose factors, with experiments claiming reduced leakage and higher acceptance rates.
From Awareness to Intent: Mitigating Silent Driving System Failures through Prospective Situation Awareness Enhancing Interfaces
cs.HC 2026-04 unverdicted novelty 5.0

Prospective situation awareness enhancing interfaces delivered via AR HUD improve takeover performance after silent automation failures, with perceptual cues most effective at raising situational awareness and system-...
Domain-Specialized Object Detection via Model-Level Mixtures of Experts
cs.CV 2026-04 unverdicted novelty 5.0

Model-level MoE of domain-specialized YOLO detectors with gating network outperforms standard ensembles on BDD100K while revealing expert specialization.
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making
cs.AI 2026-04 unverdicted novelty 5.0

This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
Governed Reasoning for Institutional AI
cs.AI 2026-04 unverdicted novelty 5.0

Cognitive Core uses nine typed cognitive primitives, a four-tier governance model with human review as an execution condition, and an endogenous audit ledger to reach 91% accuracy with zero silent errors on prior auth...
Explanation-Aware Learning for Enhanced Interpretability in Biomedical Imaging
cs.CV 2026-05 unverdicted novelty 4.0

Adding explanation supervision to training improves spatial alignment of saliency maps with clinical annotations on chest X-rays while keeping predictive accuracy comparable.
LLMs Should Not Yet Be Credited with Decision Explanation
cs.AI 2026-05 unverdicted novelty 4.0

LLMs support decision prediction and rationale generation but lack evidence for genuine decision explanation, requiring stricter standards to avoid over-crediting.
From Trust to Appropriate Reliance: Measurement Constructs in Human-AI Decision-Making
cs.HC 2026-04 unverdicted novelty 4.0

A literature review shows that constructs for appropriate reliance on AI are fragmented, presents three views on the topic, and calls for consensus on objective metrics to enable better comparisons across studies.
Out of Context: Reliability in Multimodal Anomaly Detection Requires Contextual Inference
cs.LG 2026-04 unverdicted novelty 4.0

Multimodal anomaly detection must be reframed as cross-modal contextual inference that separates context from observations to define abnormality conditionally.
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms
cs.LG 2026-04 unverdicted novelty 4.0

The paper delivers a mechanism-centric taxonomy and unified perspective on explainable human activity recognition methods across sensing modalities.
Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making
cs.LG 2026-04 unverdicted novelty 4.0

An event-centric framework encodes environments as semantic events and retrieves weighted prior maneuvers from a knowledge bank to enable interpretable, physics-aware decision-making for UAVs.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 28 Pith papers · 3 internal anchors

[1]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man \'e . Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Structuring dimensions for collaborative systems evaluation

Pedro Antunes, Valeria Herskovic, Sergio F Ochoa, and Jose A Pino. Structuring dimensions for collaborative systems evaluation. ACM Computing Surveys, 2012

work page 2012
[3]

Explanation: A mechanist alternative

William Bechtel and Adele Abrahamsen. Explanation: A mechanist alternative. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 2005

work page 2005
[4]

\ UCI \ repository of machine learning databases

Catherine Blake and Christopher J Merz. \ UCI \ repository of machine learning databases. 1998

work page 1998
[5]

The ethics of artificial intelligence

Nick Bostrom and Eliezer Yudkowsky. The ethics of artificial intelligence. The Cambridge Handbook of Artificial Intelligence, 2014

work page 2014
[6]

OpenAI Gym

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

Model compression

Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006

work page 2006
[8]

Identifying police officers at risk of adverse events

Samuel Carton, Jennifer Helsby, Kenneth Joseph, Ayesha Mahmud, Youngsoo Park, Joe Walsh, Crystal Cody, CPT Estella Patterson, Lauren Haynes, and Rayid Ghani. Identifying police officers at risk of adverse events. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016

work page 2016
[9]

Reading tea leaves: How humans interpret topic models

Jonathan Chang, Jordan L Boyd-Graber, Sean Gerrish, Chong Wang, and David M Blei. Reading tea leaves: How humans interpret topic models. In NIPS, 2009

work page 2009
[10]

Speculations on human causal learning and reasoning

Nick Chater and Mike Oaksford. Speculations on human causal learning and reasoning. Information sampling and adaptive cognition, 2006

work page 2006
[11]

Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis

Finale Doshi-Velez, Yaorong Ge, and Isaac Kohane. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics, 133 0 (1): 0 e54--e63, 2014

work page 2014
[12]

Graph-sparse lda: a topic model with structured sparsity

Finale Doshi-Velez, Byron Wallace, and Ryan Adams. Graph-sparse lda: a topic model with structured sparsity. Association for the Advancement of Artificial Intelligence, 2015

work page 2015
[13]

Fairness through awareness

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science Conference. ACM, 2012

work page 2012
[14]

Comprehensible classification models: a position paper

Alex Freitas. Comprehensible classification models: a position paper. ACM SIGKDD Explorations, 2014

work page 2014
[15]

Meta-unsupervised-learning: A supervised approach to unsupervised learning

Vikas K Garg and Adam Tauman Kalai. Meta-unsupervised-learning: A supervised approach to unsupervised learning. arXiv preprint arXiv:1612.09030, 2016

work page arXiv 2016
[16]

Rethinking mechanistic explanation

Stuart Glennan. Rethinking mechanistic explanation. Philosophy of science, 2002

work page 2002
[17]

right to explanation

Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a" right to explanation". arXiv preprint arXiv:1606.08813, 2016

work page arXiv 2016
[18]

Monotonic calibrated interpolated look-up tables

Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, and Alexander Van Esbroeck. Monotonic calibrated interpolated look-up tables. Journal of Machine Learning Research, 2016

work page 2016
[19]

CMU computer won poker battle over humans by statistically significant margin

Sean Hamill. CMU computer won poker battle over humans by statistically significant margin. http://www.post-gazette.com/business/tech-news/2017/01/31/CMU-computer-won-poker-battle-over-humans-by-statistically-significant-margin/stories/201701310250 , 2017. Accessed: 2017-02-07

work page arXiv 2017
[20]

On the geometry of differential privacy

Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In ACM Symposium on Theory of Computing. ACM, 2010

work page 2010
[21]

Equality of opportunity in supervised learning

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, 2016

work page 2016
[22]

Studies in the logic of explanation

Carl Hempel and Paul Oppenheim. Studies in the logic of explanation. Philosophy of science, 1948

work page 1948
[23]

Complexity measures of supervised classification problems

Tin Kam Ho and Mitra Basu. Complexity measures of supervised classification problems. IEEE transactions on pattern analysis and machine intelligence, 2002

work page 2002
[24]

Explanation and understanding

Frank Keil. Explanation and understanding. Annu. Rev. Psychol., 2006

work page 2006
[25]

What lies beneath? understanding the limits of understanding

Frank Keil, Leonid Rozenblit, and Candice Mills. What lies beneath? understanding the limits of understanding. Thinking and seeing: Visual metacognition in adults and children, 2004

work page 2004
[26]

Inferring robot task plans from human team meetings: A generative modeling approach with logic-based prior

Been Kim, Caleb Chacha, and Julie Shah. Inferring robot task plans from human team meetings: A generative modeling approach with logic-based prior. Association for the Advancement of Artificial Intelligence, 2013

work page 2013
[27]

i BCM : Interactive bayesian case model empowering humans via intuitive interaction

Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. i BCM : Interactive bayesian case model empowering humans via intuitive interaction. 2015 a

work page 2015
[28]

Mind the gap: A generative approach to interpretable feature selection and extraction

Been Kim, Julie Shah, and Finale Doshi-Velez. Mind the gap: A generative approach to interpretable feature selection and extraction. In Advances in Neural Information Processing Systems, 2015 b

work page 2015
[29]

Interpretable decision sets: A joint framework for description and prediction

Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1675--1684. ACM, 2016

work page 2016
[30]

Research methods in human-computer interaction

Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. Research methods in human-computer interaction. John Wiley & Sons, 2010

work page 2010
[31]

Rationalizing neural predictions

Tao Lei, Regina Barzilay, and Tommi Jaakkola. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016

work page arXiv 2016
[32]

The structure and function of explanations

Tania Lombrozo. The structure and function of explanations. Trends in cognitive sciences, 10 0 (10): 0 464--470, 2006

work page 2006
[33]

Intelligible models for classification and regression

Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models for classification and regression. In ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012

work page 2012
[34]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[35]

Human Memory

Ian Neath and Aimee Surprenant. Human Memory. 2003

work page 2003
[36]

Safe and interpretable machine learning: A methodological review

Clemens Otte. Safe and interpretable machine learning: A methodological review. In Computational Intelligence in Intelligent Data Analysis. Springer, 2013

work page 2013
[37]

General data protection regulation

Parliament and Council of the European Union. General data protection regulation. 2016

work page 2016
[38]

``why should i trust you?": Explaining the predictions of any classifier

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ``why should i trust you?": Explaining the predictions of any classifier. arXiv preprint arXiv:1602.04938, 2016

work page arXiv 2016
[39]

Data mining for discrimination discovery

Salvatore Ruggieri, Dino Pedreschi, and Franco Turini. Data mining for discrimination discovery. ACM Transactions on Knowledge Discovery from Data, 2010

work page 2010
[40]

Compositional inductive biases in function learning

Eric Schulz, Joshua Tenenbaum, David Duvenaud, Maarten Speekenbrink, and Samuel Gershman. Compositional inductive biases in function learning. bioRxiv, 2016

work page 2016
[41]

Hidden technical debt in machine learning systems

D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Fran c ois Crespo, and Dan Dennison. Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems, 2015

work page 2015
[42]

Mastering the game of go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016

work page 2016
[43]

Privacy versus antidiscrimination

Lior Jacob Strahilevitz. Privacy versus antidiscrimination. University of Chicago Law School Working Paper, 2008

work page 2008
[44]

Automatic neural reconstruction from petavoxel of electron microscopy data

Adi Suissa-Peleg, Daniel Haehn, Seymour Knowles-Barley, Verena Kaynig, Thouis R Jones, Alyssa Wilson, Richard Schalek, Jeffery W Lichtman, and Hanspeter Pfister. Automatic neural reconstruction from petavoxel of electron microscopy data. Microscopy and Microanalysis, 2016

work page 2016
[45]

Adnostic: Privacy preserving targeted advertising

Vincent Toubiana, Arvind Narayanan, Dan Boneh, Helen Nissenbaum, and Solon Barocas. Adnostic: Privacy preserving targeted advertising. 2010

work page 2010
[46]

Openml: networked science in machine learning

Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl, and Luis Torgo. Openml: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15 0 (2): 0 49--60, 2014

work page 2014
[47]

On the safety of machine learning: Cyber-physical systems, decision sciences, and data products

Kush Varshney and Homa Alemzadeh. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. CoRR, 2016

work page 2016
[48]

Falling rule lists

Fulton Wang and Cynthia Rudin. Falling rule lists. In AISTATS, 2015

work page 2015
[49]

Bayesian rule sets for interpretable classification

Tong Wang, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry MacNeille. Bayesian rule sets for interpretable classification. In International Conference on Data Mining, 2017

work page 2017
[50]

Axis: Generating explanations at scale with learnersourcing and machine learning

Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z Gajos, Walter S Lasecki, and Neil Heffernan. Axis: Generating explanations at scale with learnersourcing and machine learning. In ACM Conference on Learning@ Scale. ACM, 2016

work page 2016
[51]

The human kernel

Andrew Wilson, Christoph Dann, Chris Lucas, and Eric Xing. The human kernel. In Advances in Neural Information Processing Systems, 2015

work page 2015