Optimal Explanations of Linear Models
Pith reviewed 2026-05-25 01:20 UTC · model grok-4.3
The pith
Linear models can be explained by decomposing them into a sequence of simpler models through successive coordinate updates on their coefficients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that explanations for linear models are obtained by solving an optimization problem whose solution is a sequence of coordinate updates; this sequence yields a parametrized family of interpretability metrics and makes the accuracy-interpretability tradeoff explicit and computable.
What carries the argument
The decomposition of a linear model into a sequence of models of increasing complexity obtained by coordinate updates on the coefficients, computed by exact algorithms or scalable heuristics.
If this is right
- The framework produces a family of interpretability metrics that generalize typical proxies used in practice.
- Exact and heuristic algorithms exist to compute the optimal decomposition for any given linear model.
- The approach quantifies the tradeoff between how interpretable a model is and how much predictive accuracy it retains.
- The resulting explanations can be used to support complex decisions by revealing the model's reasoning step by step.
Where Pith is reading between the lines
- The same decomposition idea could be tested on generalized linear models to see whether the coordinate-update sequence still yields useful explanations.
- Organizations could adopt the derived metrics as a standardized way to audit the explainability of deployed linear models.
- If the sequences align with human judgments on benchmark tasks, they might reduce vulnerability to adversarial attacks by making the model's logic transparent at each step.
Load-bearing premise
That the sequence of models produced by coordinate updates on the coefficients forms a meaningful explanation of the original model's reasoning rather than an arbitrary reordering of terms.
What would settle it
Apply the method to a small linear regression on a dataset whose intuitive feature ordering is already known and check whether the generated sequence deviates from that ordering on more than a small fraction of runs.
Figures
read the original abstract
When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framework to create explanations for linear models. Our methodology decomposes a linear model into a sequence of models of increasing complexity using coordinate updates on the coefficients. Computing this decomposition optimally is a difficult optimization problem for which we propose exact algorithms and scalable heuristics. By solving this problem, we can derive a parametrized family of interpretability metrics for linear models that generalizes typical proxies, and study the tradeoff between interpretability and predictive accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a general optimization framework for explaining linear models. It decomposes a given linear model into a sequence of models of increasing complexity by performing coordinate updates on the coefficients. Exact algorithms and scalable heuristics are developed to solve the resulting optimization problem. From the optimal decomposition the authors derive a parametrized family of interpretability metrics that generalize common proxies and examine the resulting interpretability-accuracy tradeoff.
Significance. If the coordinate-update decomposition can be shown to produce sequences that capture model reasoning in a non-arbitrary way, the framework would supply a principled, optimization-based route to interpretability metrics for linear models together with concrete algorithms. The algorithmic component and the explicit parametrization of metrics are concrete strengths.
major comments (1)
- [Abstract and §3] Abstract and §3 (methodology): the central claim that the coordinate-update sequence constitutes an 'explanation' of the original model's reasoning rests on the unstated assumption that the ordering induced by successive coordinate updates aligns with semantic reasoning steps. No linking criterion, invariance property, or validation is supplied showing why this ordering is explanatory rather than an arbitrary reordering of coefficients; this assumption is load-bearing for the claim that the framework produces explanations rather than merely parametrized orderings.
minor comments (2)
- Notation for the parametrized interpretability metrics is introduced without an explicit list of the free parameters or a table showing how common proxies arise as special cases.
- The abstract states that the decomposition is computed 'optimally,' yet the distinction between the exact algorithm and the scalable heuristic is not quantified in terms of approximation guarantees or empirical gap on the objective.
Simulated Author's Rebuttal
We thank the referee for the constructive review. We respond to the major comment below.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (methodology): the central claim that the coordinate-update sequence constitutes an 'explanation' of the original model's reasoning rests on the unstated assumption that the ordering induced by successive coordinate updates aligns with semantic reasoning steps. No linking criterion, invariance property, or validation is supplied showing why this ordering is explanatory rather than an arbitrary reordering of coefficients; this assumption is load-bearing for the claim that the framework produces explanations rather than merely parametrized orderings.
Authors: The framework defines an explanation as the sequence obtained by solving the stated optimization problem, which yields a decomposition that is optimal with respect to an objective trading off fidelity to the original model against a complexity penalty. This optimality supplies a non-arbitrary criterion for the induced ordering. The parametrized interpretability metrics are then derived directly from properties of any such optimal sequence and are shown to generalize standard proxies. The manuscript does not claim or require that the ordering matches semantic reasoning steps of a human interpreter; its contribution is the optimization-based construction itself and the resulting metrics for the interpretability-accuracy tradeoff. We will revise the abstract and §3 to state this grounding explicitly and to remove any phrasing that could suggest semantic alignment. revision: yes
Circularity Check
No circularity: forward proposal of optimization framework with independent derivation of metrics
full rationale
The paper introduces a new optimization-based decomposition of linear models via coordinate updates on coefficients and then derives a parametrized family of interpretability metrics from that construction. No step reduces a claimed prediction, uniqueness result, or metric to a fitted quantity or self-citation defined inside the paper; the central claim is a methodological proposal whose validity rests on external interpretability criteria rather than internal redefinition. The derivation chain is self-contained against the stated inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sendhil Mullainathan and Ziad Obermeyer. Does machine learning automate moral hazard and error? American Economic Review, 107(5):476–480, 2017
work page 2017
-
[2]
Human decisions and machine predictions
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human decisions and machine predictions. The quarterly journal of economics, 133(1):237–293, 2017
work page 2017
-
[3]
An impact assessment of machine learning risk forecasts on parole board decisions and recidivism
Richard Berk. An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Journal of Experimental Criminology, 13(2):193–216, 2017
work page 2017
-
[4]
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science, 64(3):1155–1170, nov 2016
work page 2016
-
[5]
Alex A. Freitas. Comprehensible classification models. ACM SIGKDD Explorations Newsletter, 15(1):1–10, 2014
work page 2014
-
[6]
Zachary C. Lipton. The Mythos of Model Interpretability. arXiv preprint arXiv:1606.03490, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[7]
Selected techniques for data mining in medicine
Nada Lavra ˇc. Selected techniques for data mining in medicine. Artificial intelligence in medicine, 16(1):3–23, 1999
work page 1999
-
[8]
Bryce Goodman and Seth Flaxman. European Union regulations on algorithmic decision-making and a "right to explanation". pages 1–9, 2016
work page 2016
-
[9]
Statistical learning with sparsity: the lasso and generalizations
Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015
work page 2015
-
[10]
Classification and regression trees
Leo Breiman. Classification and regression trees. New York: Routledge, 1984
work page 1984
-
[11]
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
Been Kim, Cynthia Rudin, and Julie Shah. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. In Neural Information Processing Systems (NIPS) 2014, 2014
work page 2014
-
[12]
Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 9(3):1350–1371, 2015
work page 2015
-
[13]
Hongyu Yang, Cynthia Rudin, and Margo Seltzer. Scalable Bayesian Rule Lists. In Proceedings of the 34th International Conference on Machine Learning, 2017
work page 2017
-
[14]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016
work page 2016
-
[15]
The elements of statistical learning
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Springer series in statistics New York, NY , USA:, 2001
work page 2001
-
[16]
Algorithmic Transparency via Quantitative Input Influence :
Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic Transparency via Quantitative Input Influence :. In 2016 IEEE Symposium on Security and Privacy, 2016
work page 2016
-
[17]
Interpreting Blackbox Models via Model Extraction
Hamsa Bastani, Osbert Bastani, and Carolyn Kim. Interpreting Predictive Models for Human-in-the-Loop Analytics. arXiv preprint arXiv:1705.08504, pages 1–45, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Interpretable & Explorable Approximations of Black Box Models
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Interpretable & Explorable Approximations of Black Box Models. FAT/ML, jul 2017
work page 2017
-
[19]
Cristian Bucilˇa, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06, page 535, New York, New York, USA, 2006. ACM, ACM Press
work page 2006
-
[20]
Robert J. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996
work page 1996
-
[21]
Jonathan Taylor and Robert J. Tibshirani. Statistical learning and selective inference. Proceedings of the National Academy of Sciences, 112(25):7629–7634, jun 2015
work page 2015
-
[22]
Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least Angle Regression. Annals of Statistics, 32(2):407– 499, apr 2004
work page 2004
-
[23]
Best subset selection via a modern optimization lens
Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. Annals of Statistics, 44(2):813–852, 2016
work page 2016
-
[24]
Simple rules for complex decisions
Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, and Daniel G Goldstein. Simple rules for complex decisions. feb 2017
work page 2017
-
[25]
Supersparse linear integer models for optimized medical scoring systems
Berk Ustun and Cynthia Rudin. Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102(3):349–391, 2016. 9 A PREPRINT - JULY 11, 2019
work page 2016
-
[26]
Statistical modeling: The two cultures
Leo Breiman. Statistical modeling: The two cultures. Statistical science, 16(3):199–231, 2001
work page 2001
-
[27]
Explaining Explanations: An Overview of Interpretability of Machine Learning
Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. Explaining Explanations : An Approach to Evaluating Interpretability of Machine Learning. arXiv preprint arXiv:1806.00069, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez and Been Kim. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608, (Ml):1–13, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
I. Y . Kim and O. L. De Weck. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation.Structural and Multidisciplinary Optimization, 29(2):149–158, 2005
work page 2005
-
[30]
Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and background knowledge.Progress in Artificial Intelligence, 2(2-3):113–127, 2014. A Proof of Theorem 1 Proof of part (a). Asc(·) is bounded, we havecmax∈ R such that 0<c (·)≤cmax. Let m+∈P (m+) be a path of optimal length to the modelm+, i.e.,|m+| =Lc(m+). . Let m−∈P (m−) be any pat...
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.