pith. sign in

arxiv: 2605.19618 · v1 · pith:RLBC5YAJnew · submitted 2026-05-19 · 💻 cs.LG · stat.ME

A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees

Pith reviewed 2026-05-20 08:11 UTC · model grok-4.3

classification 💻 cs.LG stat.ME
keywords divergence measuresensemble treessurrogate modelsmodel interpretabilityreconstruction qualitypermutation testingCressie-Read divergenceloss of interpretability
0
0 comments X

The pith

The normalized Loss of Interpretability decomposes disagreement between ensemble trees and their surrogates into within-node and between-node parts to diagnose exact reconstruction failures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework for validating surrogate models of ensemble learners by measuring structural agreement instead of relying on association metrics like correlation. It centers on the normalized Loss of Interpretability drawn from the Cressie-Read power divergence family at lambda equal to 2, which breaks down into within-node and between-node components for precise identification of where and why approximations fail. Four complementary measures are defined together with a unified permutation testing procedure that supports valid inference for all of them in a single resampling pass. A sympathetic reader would care because correlation approaches overlook systematic mismatches in prediction co-occurrence patterns that affect the reliability of interpretations from the surrogate.

Core claim

Rooted in the Cressie-Read power divergence family with lambda equal to 2, the nLoI admits a closed-form decomposition into within-node and between-node components, providing a unique diagnostic capability to identify precisely where and why reconstruction fails. The framework incorporates four complementary measures capturing distinct structural facets of approximation quality. A unified permutation testing procedure delivers valid inference for all measures within a single resampling pass. Theoretical properties including boundedness and symmetry are established, and evaluations confirm exact Type I error control while detecting reconstruction fidelity gradients invisible to correlation.

What carries the argument

The normalized Loss of Interpretability (nLoI), obtained by normalizing the Cressie-Read power divergence at lambda equal to 2, which decomposes total disagreement into within-node and between-node additive terms for diagnostic localization.

If this is right

  • The decomposition enables direct identification of the specific nodes responsible for any observed reconstruction failure.
  • A single permutation procedure yields valid p-values for all four measures while controlling Type I error exactly.
  • The measures remain bounded and symmetric, allowing reliable quantification of structural agreement across different datasets.
  • Monte Carlo and empirical results show these measures detect fidelity differences that correlation-based approaches miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The node-level breakdown could be used to guide targeted retraining or pruning of surrogate trees to improve fidelity.
  • Similar decomposable divergence measures might be developed for surrogate models of other ensemble types such as gradient boosting.
  • Visualization of the within-node versus between-node contributions could help practitioners prioritize which parts of the tree to interpret most carefully.
  • The permutation framework could be adapted to provide simultaneous inference when comparing multiple surrogate candidates at once.

Load-bearing premise

The specific choice of lambda equal to 2 together with the proposed normalization produces measures that meaningfully capture agreement in tree reconstruction beyond what association metrics already provide.

What would settle it

A controlled example in which the surrogate exactly reproduces the ensemble's predictions yet any of the four measures yields a value significantly different from zero, or in which the within-node and between-node components fail to sum exactly to the reported nLoI.

Figures

Figures reproduced from arXiv: 2605.19618 by Agostino Gnasso, Carmela Iorio, Massimo Aria.

Figure 1
Figure 1. Figure 1: Statistical power as a function of signal strength for five measures [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of co-occurrence matrices for the Iris dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
read the original abstract

Validating interpretable surrogate models for ensemble learners requires measuring agreement between the ensemble's internal representation and its surrogate approximation, rather than mere association. Correlation-based approaches are scale-invariant and fail to detect systematic discrepancies in co-occurrence structure. We propose a statistical framework grounded in the agreement-association distinction, centered on the normalized Loss of Interpretability (nLoI). Rooted in the Cressie-Read power divergence family with lambda equal to 2, the nLoI admits a closed-form decomposition into within-node and between-node components, providing a unique diagnostic capability to identify precisely where and why reconstruction fails. The framework incorporates four complementary measures capturing distinct structural facets of approximation quality. A unified permutation testing procedure delivers valid inference for all measures within a single resampling pass. Theoretical properties, including boundedness and symmetry, are established for each metric. Monte Carlo simulations and empirical evaluations confirm exact Type I error control and demonstrate that these measures detect reconstruction fidelity gradients invisible to correlation-based alternatives. The framework is developed and illustrated in the context of Explainable Ensemble Trees (E2Tree), and empirical evaluation on three benchmark datasets illustrates the practical utility of the framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a statistical framework for evaluating reconstruction quality in Explainable Ensemble Trees (E2Tree) using divergence measures from the Cressie-Read power divergence family at lambda=2. It centers on the normalized Loss of Interpretability (nLoI) with a claimed closed-form decomposition into within-node and between-node components, introduces four complementary measures, a unified permutation testing procedure for inference, establishes theoretical properties including boundedness, symmetry, and exact Type I error control, and validates these via Monte Carlo simulations and empirical results on three benchmark datasets showing superior detection of fidelity gradients compared to correlation metrics.

Significance. If the claims hold, particularly the algebraic decomposition and valid inference under the permutation procedure, this framework would provide a meaningful advance for assessing surrogate interpretability in ensemble models by moving beyond association metrics to diagnose specific sources of reconstruction failure. The closed-form decomposition into node-level components and the single-pass permutation test for multiple measures represent useful technical contributions for diagnostic evaluation in interpretable machine learning.

major comments (1)
  1. [§4.2] §4.2 (Unified Permutation Testing Procedure): The claim of exact Type I error control for nLoI and the other three measures rests on a unified permutation test, but the procedure is described only at a high level without specifying whether permutations are applied to raw observations, node assignments, or predictions while preserving the recursive partitioning constraints of the tree. In hierarchical tree structures, sample-to-node mappings are dependent due to shared splits; a naive permutation risks producing anti-conservative p-values. This is load-bearing for the abstract's inference validity assertion and the diagnostic utility argument, as the decomposition itself follows algebraically from the Cressie-Read functional once normalization is fixed.
minor comments (2)
  1. [Abstract] Abstract: The claim that the measures 'detect reconstruction fidelity gradients invisible to correlation-based alternatives' would be strengthened by a brief quantitative example or reference to a specific table/figure showing the difference in sensitivity.
  2. [§3.1] Notation: The normalization step applied to the Cressie-Read divergence at lambda=2 is introduced without an explicit equation reference in the main text; adding a numbered equation for the normalized form would improve traceability of the boundedness and symmetry proofs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough and constructive review. We are encouraged by the recognition of the framework's potential contribution to diagnostic evaluation of surrogate interpretability. We address the major comment on the unified permutation testing procedure below and will revise the manuscript to incorporate additional details.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Unified Permutation Testing Procedure): The claim of exact Type I error control for nLoI and the other three measures rests on a unified permutation test, but the procedure is described only at a high level without specifying whether permutations are applied to raw observations, node assignments, or predictions while preserving the recursive partitioning constraints of the tree. In hierarchical tree structures, sample-to-node mappings are dependent due to shared splits; a naive permutation risks producing anti-conservative p-values. This is load-bearing for the abstract's inference validity assertion and the diagnostic utility argument, as the decomposition itself follows algebraically from the Cressie-Read functional once normalization is fixed.

    Authors: We thank the referee for this important observation on the level of detail provided for the unified permutation testing procedure. The procedure permutes the surrogate predictions (equivalently, the reconstruction targets) while fixing the node assignments induced by the original ensemble tree's recursive partitioning. This preserves all hierarchical dependencies arising from shared splits and ensures that the within-node and between-node structure remains intact under the null. The Monte Carlo simulations reported in the manuscript were conducted under precisely this scheme and demonstrate exact Type I error control. We agree that the current high-level description in §4.2 would benefit from greater explicitness. In the revised manuscript we will expand this section with a step-by-step algorithmic description, pseudocode for the single-pass resampling that yields p-values for all measures simultaneously, and a brief justification of why fixing the tree structure avoids the anti-conservative behavior that would arise from permuting raw observations or node labels. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from external divergence family

full rationale

The nLoI and related measures are explicitly rooted in the pre-existing Cressie-Read power divergence family at fixed lambda=2, an external statistical construct. The closed-form decomposition into within-node and between-node components follows directly as an algebraic property of that functional once the normalization is applied, without any fitting to reconstruction data or redefinition that would make the diagnostic output equivalent to the input by construction. The unified permutation procedure is introduced as a resampling method whose Type I error control is asserted via separate Monte Carlo simulations rather than assumed tautologically. No self-citations, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation appear as load-bearing steps in the provided abstract or framework description. The claims of boundedness, symmetry, and superior detection of fidelity gradients are presented as independent theoretical and empirical results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into explicit assumptions; the framework rests on the suitability of Cressie-Read divergences for this interpretability task and on the validity of the permutation-based inference without post-hoc adjustments.

axioms (1)
  • domain assumption Cressie-Read power divergence family with lambda=2 is a suitable base for measuring reconstruction agreement in ensemble tree surrogates.
    Framework is explicitly grounded in this family and lambda choice to enable the claimed decomposition and properties.

pith-pipeline@v0.9.0 · 5736 in / 1306 out tokens · 49124 ms · 2026-05-20T08:11:13.085713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    and Gnasso, A

    Aria, M. and Gnasso, A. and Iorio, C. and Pandolfo, G. , title =. Computational Statistics , year =

  2. [2]

    Applied Stochastic Models in Business and Industry , volume=

    Extending Explainable Ensemble Trees to Regression Contexts , author=. Applied Stochastic Models in Business and Industry , volume=. 2026 , publisher=

  3. [3]

    The lancet , volume=

    Statistical methods for assessing agreement between two methods of clinical measurement , author=. The lancet , volume=. 1986 , publisher=

  4. [4]

    Machine learning , volume=

    Random forests , author=. Machine learning , volume=. 2001 , publisher=

  5. [5]

    Proceedings of the 23rd international conference on Machine learning , pages=

    An empirical comparison of supervised learning algorithms , author=. Proceedings of the 23rd international conference on Machine learning , pages=

  6. [6]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Multinomial goodness-of-fit tests , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1984 , publisher=

  7. [7]

    Dawid, A. P. and Musio, M. and Ventura, L. , title =. Scandinavian Journal of Statistics , year =

  8. [8]

    Scandinavian Journal of Statistics , volume=

    Inference for all variants of the multivariate coefficient of variation in factorial designs , author=. Scandinavian Journal of Statistics , volume=. 2025 , publisher=

  9. [9]

    Scandinavian Journal of Statistics , volume=

    Plug-in machine learning for partially linear mixed-effects models with repeated measurements , author=. Scandinavian Journal of Statistics , volume=. 2023 , publisher=

  10. [10]

    The annals of mathematical statistics , pages=

    Transformations related to the angular and the square root , author=. The annals of mathematical statistics , pages=. 1950 , publisher=

  11. [11]

    Scandinavian Journal of Statistics , volume=

    MMCTest—a safe algorithm for implementing multiple Monte Carlo tests , author=. Scandinavian Journal of Statistics , volume=. 2014 , publisher=

  12. [12]

    Scandinavian Journal of Statistics , volume=

    Implementing Monte Carlo tests with p-value buckets , author=. Scandinavian Journal of Statistics , volume=. 2020 , publisher=

  13. [13]

    and Matcham, T

    Gandy, A. and Matcham, T. J. , title =. Scandinavian Journal of Statistics , year =

  14. [14]

    2005 , publisher=

    Permutation, parametric and bootstrap tests of hypotheses , author=. 2005 , publisher=

  15. [15]

    Evolution , volume=

    Poor statistical performance of the Mantel test in phylogenetic comparative analyses , author=. Evolution , volume=. 2010 , publisher=

  16. [16]

    Neue begr

    Hellinger, Ernst , journal=. Neue begr. 1909 , publisher=

  17. [17]

    Computational Statistics & Data Analysis , volume=

    Testing the significance of the RV coefficient , author=. Computational Statistics & Data Analysis , volume=. 2008 , publisher=

  18. [18]

    Biometrics , pages=

    A concordance correlation coefficient to evaluate reproducibility , author=. Biometrics , pages=. 1989 , publisher=

  19. [19]

    Cancer research , volume=

    The detection of disease clustering and a generalized regression approach , author=. Cancer research , volume=. 1967 , publisher=

  20. [20]

    Scandinavian journal of statistics , volume=

    Cressie--Read Power-Divergence Statistics for Non-Gaussian Vector Stationary Processes , author=. Scandinavian journal of statistics , volume=. 2009 , publisher=

  21. [21]

    Oecologia , volume=

    How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test , author=. Oecologia , volume=. 2001 , publisher=

  22. [22]

    2010 , publisher=

    Permutation tests for complex data: theory, applications and software , author=. 2010 , publisher=

  23. [23]

    Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn

    Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn , author=. arXiv preprint arXiv:1603.05766 , year=

  24. [24]

    1989 , publisher=

    Smooth tests of goodness of fit , author=. 1989 , publisher=

  25. [25]

    2012 , publisher=

    Goodness-of-fit statistics for discrete multivariate data , author=. 2012 , publisher=

  26. [26]

    Journal of the Royal Statistical Society Series C: Applied Statistics , volume=

    A unifying tool for linear multivariate statistical methods: the RV-coefficient , author=. Journal of the Royal Statistical Society Series C: Applied Statistics , volume=. 1976 , publisher=

  27. [27]

    Nature machine intelligence , volume=

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , author=. Nature machine intelligence , volume=. 2019 , publisher=

  28. [28]

    IEEE transactions on image processing , volume=

    Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

  29. [29]

    Scandinavian Journal of Statistics , volume=

    Asymptotic distribution-free tests related to maximum mean discrepancy , author=. Scandinavian Journal of Statistics , volume=. 2025 , publisher=

  30. [30]

    Scandinavian Journal of Statistics , volume=

    Local Whittle likelihood approach for generalized divergence , author=. Scandinavian Journal of Statistics , volume=. 2020 , publisher=

  31. [31]

    Computational Statistics & Data Analysis , volume=

    A significance test of the RV coefficient in high dimensions , author=. Computational Statistics & Data Analysis , volume=. 2019 , publisher=

  32. [32]

    Computational Statistics & Data Analysis , volume=

    Efficient permutation testing of variable importance measures by the example of random forests , author=. Computational Statistics & Data Analysis , volume=. 2023 , publisher=

  33. [33]

    Computational Statistics & Data Analysis , volume=

    On the use of random forest for two-sample testing , author=. Computational Statistics & Data Analysis , volume=. 2022 , publisher=

  34. [34]

    Computational Statistics & Data Analysis , volume=

    Correspondence analysis and the Freeman--Tukey statistic: A study of archaeological data , author=. Computational Statistics & Data Analysis , volume=. 2018 , publisher=