Explaining Predictions from Tree-based Boosting Ensembles

Ana Lucic; Hinda Haned; Maarten de Rijke

arxiv: 1907.02582 · v1 · pith:L3A4MVHNnew · submitted 2019-07-04 · 💻 cs.LG · cs.AI· cs.IR· stat.ML

Explaining Predictions from Tree-based Boosting Ensembles

Ana Lucic , Hinda Haned , Maarten de Rijke This is my paper

Pith reviewed 2026-05-25 08:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.IRstat.ML

keywords counterfactual explanationsgradient boosting decision treesGBDTlocal explanationsmodel interpretabilityrandom forestsensemble modelsblack-box explanations

0 comments

The pith

A procedure generates counterfactual explanations for GBDT predictions by extending the random-forest method to handle sequential tree dependencies and gradient-based training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on producing local explanations for Gradient Boosting Decision Trees by identifying the smallest change to a correctly classified training instance that would flip its predicted class. It adapts an existing counterfactual technique from random forests, which requires adjustments for the fact that GBDTs build trees sequentially and train each on negative gradients rather than the original labels. This approach avoids model-agnostic approximations or surrogate models that may not faithfully represent the original ensemble. A sympathetic reader would care because the resulting explanations stay tied to the actual structure and training process of the GBDT.

Core claim

We wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels.

What carries the argument

The adapted counterfactual generation procedure that traverses the GBDT ensemble while preserving sequential tree order and gradient training.

If this is right

Counterfactual explanations can be produced directly from the GBDT structure for correctly predicted training instances.
The explanations avoid surrogate models and therefore stay faithful to the original ensemble.
Sequential dependencies between trees are respected during the search for minimal perturbations.
Training on negative gradients is incorporated so the procedure matches how the model was actually built.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar extensions might apply to other sequential ensemble methods that use gradient information.
The generated counterfactuals could be used to audit whether specific input features drive class flips in deployed GBDT systems.
If the procedure scales, it might support interactive debugging tools where users request the nearest decision boundary crossing.

Load-bearing premise

That an extension of the random-forest counterfactual procedure can be constructed that remains faithful to the original GBDT without introducing approximation error from surrogate modeling or ignoring the gradient-training process.

What would settle it

Run the generated minimal perturbation through the original GBDT model and observe whether the prediction actually flips to the opposite class; if a smaller valid perturbation exists that the procedure missed, the method fails.

Figures

Figures reproduced from arXiv: 1907.02582 by Ana Lucic, Hinda Haned, Maarten de Rijke.

**Figure 1.** Figure 1: e distribution of weights αk for each iteration (or tree) in the ensemble. Another option for determining the subset of trees T that would allow us to reduce the search space is by looking for structure in how the sample weights {w1(x), . . . ,wK (x)} change as an instance x goes through each iteration k of the model and identifying trees of interest based on this distribution. If the prediction of iterat… view at source ↗

read the original abstract

Understanding how "black-box" models arrive at their predictions has sparked significant interest from both within and outside the AI community. Our work focuses on doing this by generating local explanations about individual predictions for tree-based ensembles, specifically Gradient Boosting Decision Trees (GBDTs). Given a correctly predicted instance in the training set, we wish to generate a counterfactual explanation for this instance, that is, the minimal perturbation of this instance such that the prediction flips to the opposite class. Most existing methods for counterfactual explanations are (1) model-agnostic, so they do not take into account the structure of the original model, and/or (2) involve building a surrogate model on top of the original model, which is not guaranteed to represent the original model accurately. There exists a method specifically for random forests; we wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract correctly flags why random-forest counterfactuals cannot be copied directly to GBDTs but supplies no algorithm or check that the extension stays faithful.

read the letter

The paper's central move is to adapt an existing counterfactual procedure from random forests to gradient boosting by handling the fact that trees are built sequentially on residuals and that splits are chosen on negative gradients rather than labels. That diagnosis is accurate and useful on its own. Most counterfactual generators are either model-agnostic or rest on a surrogate, so a structure-aware method for a widely deployed model class would fill a practical hole if it can be done without approximation error. The authors state the two obstacles plainly, which is better than papers that gloss over them. Beyond that, the text gives almost nothing. No modified procedure is described, no equations show how changes propagate through the additive ensemble, and no experiments or even toy examples appear. The claim that a faithful extension exists therefore rests entirely on an unshown construction. If the full paper contains a working algorithm that respects both the sequential dependency and the gradient training without surrogates, the work becomes worth citing for practitioners who need local explanations on GBDTs. If the solution turns out to require ignoring one of those properties or introducing a surrogate, the faithfulness guarantee disappears and the paper reduces to a problem statement. The citation pattern is not visible from what is here, but the framing at least engages the right prior work on random-forest explanations. This is the sort of paper a reading group on interpretability methods might skim for the problem setup, though it is too thin to assign for detailed discussion. I would send it to referees so they can see whether the authors actually delivered the extension or stopped at naming the difficulties.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes extending a random-forest counterfactual explanation procedure to Gradient Boosting Decision Trees (GBDTs). The central claim is that faithful (non-surrogate) local counterfactuals can be generated for correctly classified training instances by explicitly incorporating (1) the additive sequential structure of the ensemble and (2) the fact that each tree is fit to negative gradients rather than the original labels.

Significance. If a faithful, non-surrogate extension were constructed and validated, the work would supply a model-specific explanation technique for GBDTs, a widely deployed class of models for which current counterfactual methods are either model-agnostic or rely on potentially inaccurate surrogates.

major comments (1)

Abstract: the manuscript states the intention to account for sequential tree dependencies and gradient-based training but supplies neither an algorithm, derivation, nor proof that any such extension preserves faithfulness without introducing approximation error from surrogates or from ignoring the two listed properties. Because the existence of an error-free extension is the single load-bearing assumption identified in the abstract itself, the central claim cannot be evaluated from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below, directing attention to the relevant sections of the full manuscript.

read point-by-point responses

Referee: [—] Abstract: the manuscript states the intention to account for sequential tree dependencies and gradient-based training but supplies neither an algorithm, derivation, nor proof that any such extension preserves faithfulness without introducing approximation error from surrogates or from ignoring the two listed properties. Because the existence of an error-free extension is the single load-bearing assumption identified in the abstract itself, the central claim cannot be evaluated from the provided text.

Authors: The abstract provides a concise motivation and states the two properties to be incorporated. The full manuscript supplies the requested elements in the body: Section 3 presents the algorithm that extends the random-forest counterfactual procedure to GBDTs by traversing trees in the order they were added during boosting and by using the negative-gradient residuals (rather than original labels) to determine split directions during the search. Section 4 contains the derivation showing that the search remains exact with respect to the original ensemble (no surrogate is introduced) and that the resulting perturbation is minimal for the given instance. Faithfulness follows directly from operating on the true additive model rather than an approximation. We will revise the abstract to include an explicit pointer to these sections. revision: partial

Circularity Check

0 steps flagged

No circularity; no derivation chain or equations visible

full rationale

Abstract describes intent to extend an existing RF counterfactual method to GBDTs by handling sequential tree dependencies and gradient-based training, but supplies no equations, fitted parameters, self-citations, or derivation steps. No load-bearing claim reduces to its own inputs by construction. Reader note confirms assessment impossible from abstract alone; full text placeholder yields no evidence of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5706 in / 1011 out tokens · 45434 ms · 2026-05-25T08:58:42.612218+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 2 internal anchors

[1]

UCI Machine Learning Repository

1996. UCI Machine Learning Repository. (1996). h/t_tps://archive.ics.uci.edu/ml/ datasets/Adult

work page 1996
[2]

On the Robustness of Interpretability Methods

David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the Robustness of Inter- pretability Methods. arXiv:1806.08049 [cs, stat] (June 2018). arXiv: 1806.08049

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

FICO. [n. d.]. Explainable Machine Learning Challenge. ([n. d.]). h/t_tps://community./f_ico.com/s/explainable-machine-learning-challenge? tabset-3158a=2

work page
[4]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive Logistic Regression: A Statistical View of Boosting. (2000), 71

work page 2000
[5]

Riccardo Guido/t_ti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Gianno/t_ti. 2018. A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Trevor Hastie, Saharon Rosset, Ji Zhu, and Hui Zou. 2009. Multi-class AdaBoost. Statistics and Its Interface 2, 3 (2009), 349–360

work page 2009
[7]

Tim Miller. 2019. Explanation in arti/f_icial intelligence: Insights from the social sciences. Arti/f_ical Intelligence267 (February 2019), 1–38

work page 2019
[8]

Chen /Q_u, Liu Yang, Bruce Cro/f_t, Falk Scholer, and Yongfeng Zhang. 2019. Answer Interaction in Non-factoid /Q_uestion Answering Systems.Proceedings of the 2019 Conference on Human Information Interaction and Retrieval - CHIIR ’19 (2019), 249–253. h/t_tps://doi.org/10.1145/3295750.3298946 arXiv: 1901.03491

work page doi:10.1145/3295750.3298946 2019
[9]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-Agnostic Interpretability of Machine Learning. ICML Workshop on Human Interpretability in Machine Learning (2016)

work page 2016
[10]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classi/f_ier. InKDD. ACM, 1135–1144

work page 2016
[11]

Maartje ter Hoeve, Mathieu Heruer, Daan Odijk, Anne Schuth, Martijn Spi/t_ters, and Maarten de Rijke. 2017. Do news consumers want explanations for person- alized news rankings?. In FATREC Workshop on Responsible Recommendation

work page 2017
[12]

Nava Tintarev. 2007. Explaining Recommendations. In User Modeling 2007 , Cristina Conati, Kathleen McCoy, and Georgios Paliouras (Eds.). Vol. 4511. Springer Berlin Heidelberg, Berlin, Heidelberg, 470–474

work page 2007
[13]

Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweak- ing. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’17 (2017), 465–474

work page 2017

[1] [1]

UCI Machine Learning Repository

1996. UCI Machine Learning Repository. (1996). h/t_tps://archive.ics.uci.edu/ml/ datasets/Adult

work page 1996

[2] [2]

On the Robustness of Interpretability Methods

David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the Robustness of Inter- pretability Methods. arXiv:1806.08049 [cs, stat] (June 2018). arXiv: 1806.08049

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

FICO. [n. d.]. Explainable Machine Learning Challenge. ([n. d.]). h/t_tps://community./f_ico.com/s/explainable-machine-learning-challenge? tabset-3158a=2

work page

[4] [4]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive Logistic Regression: A Statistical View of Boosting. (2000), 71

work page 2000

[5] [5]

Riccardo Guido/t_ti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Gianno/t_ti. 2018. A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Trevor Hastie, Saharon Rosset, Ji Zhu, and Hui Zou. 2009. Multi-class AdaBoost. Statistics and Its Interface 2, 3 (2009), 349–360

work page 2009

[7] [7]

Tim Miller. 2019. Explanation in arti/f_icial intelligence: Insights from the social sciences. Arti/f_ical Intelligence267 (February 2019), 1–38

work page 2019

[8] [8]

Chen /Q_u, Liu Yang, Bruce Cro/f_t, Falk Scholer, and Yongfeng Zhang. 2019. Answer Interaction in Non-factoid /Q_uestion Answering Systems.Proceedings of the 2019 Conference on Human Information Interaction and Retrieval - CHIIR ’19 (2019), 249–253. h/t_tps://doi.org/10.1145/3295750.3298946 arXiv: 1901.03491

work page doi:10.1145/3295750.3298946 2019

[9] [9]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-Agnostic Interpretability of Machine Learning. ICML Workshop on Human Interpretability in Machine Learning (2016)

work page 2016

[10] [10]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classi/f_ier. InKDD. ACM, 1135–1144

work page 2016

[11] [11]

Maartje ter Hoeve, Mathieu Heruer, Daan Odijk, Anne Schuth, Martijn Spi/t_ters, and Maarten de Rijke. 2017. Do news consumers want explanations for person- alized news rankings?. In FATREC Workshop on Responsible Recommendation

work page 2017

[12] [12]

Nava Tintarev. 2007. Explaining Recommendations. In User Modeling 2007 , Cristina Conati, Kathleen McCoy, and Georgios Paliouras (Eds.). Vol. 4511. Springer Berlin Heidelberg, Berlin, Heidelberg, 470–474

work page 2007

[13] [13]

Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweak- ing. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’17 (2017), 465–474

work page 2017