pith. sign in

arxiv: 1907.03419 · v1 · pith:76FEHYIRnew · submitted 2019-07-08 · 💻 cs.LG · stat.ML

The Price of Interpretability

Pith reviewed 2026-05-25 01:25 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords interpretabilitymachine learningtradeoffpredictive accuracymodel constructionsparsitydecision making
0
0 comments X

The pith

Machine learning models built from sequences of interpretable steps quantify the accuracy price of interpretability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for constructing machine learning models through sequences of interpretable steps. This setup allows standard proxies for interpretability, such as sparsity in linear models, to emerge naturally from the choice of steps. By generalizing these proxies, the authors create a family of consistent interpretability measures. The framework then measures the tradeoff between these measures and predictive accuracy. Readers would care because it offers a concrete way to evaluate when the benefits of understanding a model's reasoning justify any loss in performance.

Core claim

Machine learning models are constructed in a sequence of interpretable steps. For a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies, for example sparsity in linear models. These proxies are generalized to a parametrized family of consistent measures of model interpretability. This definition quantifies the price of interpretability as the tradeoff with predictive accuracy, with practical algorithms demonstrated on real and synthetic datasets.

What carries the argument

The sequence of interpretable steps, which defines model construction and supports generalization of interpretability measures.

If this is right

  • Algorithms can find models that achieve specified interpretability levels at minimal accuracy cost.
  • The approach applies directly to both real-world and synthetic data.
  • Interpretability measures become comparable across different model families through the parametrized family.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Decision support systems could incorporate explicit accuracy penalties for required levels of interpretability.
  • The method might be extended to incorporate other concerns like fairness by defining additional step types.

Load-bearing premise

A natural choice of interpretable steps will recover standard interpretability proxies such as sparsity and produce consistent measures across the parametrized family.

What would settle it

Demonstrating a set of models where the interpretability measure does not correlate with human judgments of interpretability or where increasing interpretability never reduces accuracy.

Figures

Figures reproduced from arXiv: 1907.03419 by Arthur Delarue, Dimitris Bertsimas, Patrick Jaillet, Sebastien Martin.

Figure 1
Figure 1. Figure 1: Illustration of the interpretable path framework with the three examples introduced in Section [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of an interpretable path leading to [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of an interpretable path leading to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Tradeoff between the cost of the first and second models of the interpretable path. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Pareto front between interpretability loss [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pareto fronts between model interpretability and cost in the same setting as Figure [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Price of interpretability for decision trees of depth at most 2 on the simplified [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Pareto-efficient models from the perspective of [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of a Pareto-efficient interpretable path. On the left we see the benefits of each coefficient modifica [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the ``price'' of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces a mathematical framework in which machine learning models are constructed via a sequence of interpretable steps. It claims that a natural choice of such steps recovers standard interpretability proxies (e.g., sparsity for linear models), generalizes these to a parametrized family of consistent interpretability measures, and thereby quantifies the tradeoff ('price') between interpretability and predictive accuracy, with practical algorithms demonstrated on real and synthetic data.

Significance. If the recovery of standard proxies and the consistency of the parametrized family can be established rigorously, the framework would supply a formal, general definition of interpretability that enables explicit quantification of accuracy-interpretability tradeoffs, a contribution that could structure future work on interpretable ML.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'a natural choice of interpretable steps recovers standard interpretability proxies' is asserted without any derivation, error analysis, or explicit construction visible; this step is load-bearing for the entire framework and for the subsequent quantification of the price of interpretability.
  2. [Abstract] Abstract: the assertion that the generalization 'yields a parametrized family of consistent measures of model interpretability' lacks supporting equations, proof outline, or verification that the family is internally consistent or reduces to known proxies; without this, the formal definition of the price cannot be substantiated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the importance of making the abstract claims fully traceable to the manuscript's technical content. We address each point below by directing to the relevant sections where the derivations, constructions, and consistency arguments appear.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'a natural choice of interpretable steps recovers standard interpretability proxies' is asserted without any derivation, error analysis, or explicit construction visible; this step is load-bearing for the entire framework and for the subsequent quantification of the price of interpretability.

    Authors: The abstract is a concise summary; the explicit constructions appear in Section 3. For linear models we define the sequence of interpretable steps as successive feature selections and show that the resulting interpretability measure equals the number of nonzero coefficients (i.e., the standard sparsity proxy). Analogous derivations are given for decision trees (depth and number of leaves) and for other model families. Consistency and error bounds for these recoveries are established via the general framework introduced in Section 2 and proved in the appendix. revision: no

  2. Referee: [Abstract] Abstract: the assertion that the generalization 'yields a parametrized family of consistent measures of model interpretability' lacks supporting equations, proof outline, or verification that the family is internally consistent or reduces to known proxies; without this, the formal definition of the price cannot be substantiated.

    Authors: Section 4 introduces the parametrized family by replacing the indicator function in the base interpretability measure with a continuous, monotone function controlled by a parameter. We prove internal consistency (monotonicity, normalization, and invariance under equivalent representations) and show that the family reduces exactly to the standard proxies at the boundary values of the parameter. The price of interpretability is then defined in Section 5 as the minimal accuracy loss subject to a given interpretability level; the algorithms in Section 6 implement this optimization on real and synthetic data. revision: partial

Circularity Check

0 steps flagged

No significant circularity; framework derives interpretability measures independently

full rationale

The abstract and reader's assessment describe a self-contained framework that defines models via sequences of interpretable steps, shows recovery of standard proxies (e.g., sparsity), generalizes to a parametrized family, and quantifies the accuracy tradeoff. No load-bearing step reduces by the paper's own equations to a fitted input renamed as prediction, nor relies on self-citation chains or ansatzes smuggled from prior work. The central claim remains independent of the inputs by construction, consistent with the reader's circularity score of 3.0 and the absence of any quoted reduction in the material.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework is described at the level of definitions and recovery of existing proxies.

pith-pipeline@v0.9.0 · 5669 in / 1040 out tokens · 20279 ms · 2026-05-25T01:25:33.801653+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 4 canonical work pages · 4 internal anchors

  1. [1]

    Interpreting Blackbox Models via Model Extraction

    Hamsa Bastani, Osbert Bastani, and Carolyn Kim. Interpreting Predictive Models for Human-in-the-Loop Ana- lytics. arXiv preprint arXiv:1705.08504, pages 1–45, 2018

  2. [2]

    An impact assessment of machine learning risk forecasts on parole board decisions and recidivism

    Richard Berk. An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Journal of Experimental Criminology, 13(2):193–216, 2017

  3. [3]

    Optimal classification trees

    Dimitris Bertsimas and Jack Dunn. Optimal classification trees. Machine Learning, 106(7):1039–1082, 2017

  4. [4]

    Weinstein, and Ying Daisy Zhuo

    Dimitris Bertsimas, Nathan Kallus, Alexander M. Weinstein, and Ying Daisy Zhuo. Personalized diabetes man- agement using electronic medical records. Diabetes Care, 40(2):210–217, 2017

  5. [5]

    Best subset selection via a modern optimization lens

    Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. Annals of Statistics, 44(2):813–852, 2016

  6. [6]

    Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions

    Dimitris Bertsimas and Bart Van Parys. Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions. Annals of Statistics, to appear, 2019

  7. [7]

    Classification and regression trees

    Leo Breiman. Classification and regression trees. New York: Routledge, 1984

  8. [8]

    Random Forests

    Leo Breiman. Random Forests. Machine Learning, 45(1):5–32, 2001

  9. [9]

    Statistical modeling: The two cultures

    Leo Breiman. Statistical modeling: The two cultures. Statistical science, 16(3):199–231, 2001

  10. [10]

    Model compression

    Cristian Bucil, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06 , page 535, New York, New York, USA, 2006. ACM, ACM Press

  11. [11]

    Algorithmic Transparency via Quantitative Input Influence :

    Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic Transparency via Quantitative Input Influence :. In 2016 IEEE Symposium on Security and Privacy, 2016

  12. [12]

    Algorithm aversion: People erroneously avoid algorithms after seeing them err

    Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1):114, 2015

  13. [13]

    Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, nov 2016

    Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, nov 2016. 15 A PREPRINT - J ULY 9, 2019

  14. [14]

    Towards A Rigorous Science of Interpretable Machine Learning

    Finale Doshi-Velez and Been Kim. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608, (Ml):1–13, 2017

  15. [15]

    Least Angle Regression

    Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least Angle Regression. Annals of Statis- tics, 32(2):407–499, apr 2004

  16. [16]

    Alex A. Freitas. Comprehensible classification models. ACM SIGKDD Explorations Newsletter , 15(1):1–10, 2014

  17. [17]

    The elements of statistical learning

    Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Springer series in statistics New York, NY , USA:, 2001

  18. [18]

    Explaining Explanations: An Overview of Interpretability of Machine Learning

    Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. Explaining Ex- planations : An Approach to Evaluating Interpretability of Machine Learning. arXiv preprint arXiv:1806.00069, 2018

  19. [19]

    European Union regulations on algorithmic decision-making and a ”right to explanation”

    Bryce Goodman and Seth Flaxman. European Union regulations on algorithmic decision-making and a ”right to explanation”. pages 1–9, 2016

  20. [20]

    Statistical learning with sparsity: the lasso and generalizations

    Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015

  21. [21]

    The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

    Been Kim, Cynthia Rudin, and Julie Shah. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. In Neural Information Processing Systems (NIPS) 2014, 2014

  22. [22]

    I. Y . Kim and O. L. De Weck. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structural and Multidisciplinary Optimization, 29(2):149–158, 2005

  23. [23]

    Human decisions and machine predictions

    Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human decisions and machine predictions. The quarterly journal of economics, 133(1):237–293, 2017

  24. [24]

    Ali Koc ¸ and David P. Morton. Prioritization via Stochastic Optimization.Management Science, 61(3):586–603, 2014

  25. [25]

    Interpretable decision sets: a joint framework for description and prediction

    Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. Interpretable decision sets: a joint framework for description and prediction. KDD ’16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1:1675–1684, 2016

  26. [26]

    Interpretable & Explorable Approxima- tions of Black Box Models

    Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Interpretable & Explorable Approxima- tions of Black Box Models. FAT/ML, jul 2017

  27. [27]

    McCormick, and David Madigan

    Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 9(3):1350–1371, 2015

  28. [28]

    A general approach for incremental approximation and hierarchical clus- tering

    Guolong Lin and David Williamson. A general approach for incremental approximation and hierarchical clus- tering. SIAM Journal Computing, 39(8):3633–3669, 2010

  29. [29]

    Zachary C. Lipton. The Mythos of Model Interpretability. arXiv preprint arXiv:1606.03490, 2016

  30. [30]

    Intelligible Models for Classification and Regression

    Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible Models for Classification and Regression. In Pro- ceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 150–158. ACM, 2012

  31. [31]

    Does machine learning automate moral hazard and error? American Economic Review, 107(5):476–480, 2017

    Sendhil Mullainathan and Ziad Obermeyer. Does machine learning automate moral hazard and error? American Economic Review, 107(5):476–480, 2017

  32. [32]

    Why Should I Trust You? Explaining the Predictions of Any Classifier

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why Should I Trust You? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

  33. [33]

    Tibshirani

    Jonathan Taylor and Robert J. Tibshirani. Statistical learning and selective inference.Proceedings of the National Academy of Sciences, 112(25):7629–7634, jun 2015

  34. [34]

    Tibshirani

    Robert J. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996

  35. [35]

    Supersparse linear integer models for optimized medical scoring systems

    Berk Ustun and Cynthia Rudin. Supersparse linear integer models for optimized medical scoring systems. Ma- chine Learning, 102(3):349–391, 2016

  36. [36]

    Scalable Bayesian Rule Lists

    Hongyu Yang, Cynthia Rudin, and Margo Seltzer. Scalable Bayesian Rule Lists. In Proceedings of the 34th International Conference on Machine Learning, 2017. 16 A PREPRINT - J ULY 9, 2019 A Appendix A.1 Proof of Theorem 1 Proof of part (a). Asc(·) is bounded, we havecmax ∈ R such that 0<c (·) ≤cmax. Letm+ ∈ P (m+) be a path of optimal length to the modelm+,...