Classification Trees with Valid Inference via the Exponential Mechanism

Snigdha Panigrahi; Soham Bakshi

arxiv: 2511.15068 · v3 · submitted 2025-11-19 · 📊 stat.ME

Classification Trees with Valid Inference via the Exponential Mechanism

Soham Bakshi , Snigdha Panigrahi This is my paper

Pith reviewed 2026-05-17 21:21 UTC · model grok-4.3

classification 📊 stat.ME

keywords classification treesexponential mechanismvalid inferenceadaptive modelspivotal statisticspredictive fitting

0 comments

The pith

Classification trees fitted via the exponential mechanism produce pivots for asymptotically valid inference on model parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a new way to grow classification trees by selecting splits probabilistically using the exponential mechanism instead of always choosing the best split greedily. The temperature parameter controls how much the selection deviates from the deterministic choice. Because the sampling probabilities reflect the full adaptive process of building the tree, they can be turned directly into pivots that support valid statistical inference for the parameters in the final tree model. This matters because standard tree methods do not allow reliable inference due to their data-driven structure, while common fixes like splitting the data reduce predictive power. The new method aims to deliver both accurate predictions and trustworthy inference.

Core claim

By defining split selection probabilities through an exponential mechanism, the resulting sampling distribution can be used to construct pivots that yield asymptotically valid inference for the parameters of the fitted classification tree, accounting for the adaptivity in the tree-growing process.

What carries the argument

The exponential mechanism that assigns sampling probabilities to candidate splits based on their utility and a temperature parameter, from which inference pivots are derived.

If this is right

Valid inference is possible on the parameters of the predictive fit produced by the tree.
The approach maintains predictive accuracy comparable to standard greedy trees at low temperatures.
Unlike data splitting, inference does not come at the cost of reduced accuracy.
The method explicitly handles the adaptivity of the entire tree construction process in its inference procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers working on other adaptive algorithms could adapt the exponential mechanism to obtain valid pivots similarly.
This technique might improve the reliability of predictions in applied settings where decision trees are used for classification.

Load-bearing premise

The sampling probabilities defined by the exponential mechanism fully account for the adaptivity in the tree-growing process and produce asymptotically valid inference pivots.

What would settle it

Empirical coverage rates of the confidence intervals from these pivots that fall significantly below or above the nominal level in repeated simulations with known true parameters.

Figures

Figures reproduced from arXiv: 2511.15068 by Snigdha Panigrahi, Soham Bakshi.

**Figure 2.** Figure 2: Effect of temperature on coverage, interval length, and held-out log-loss. RCT(1), [PITH_FULL_IMAGE:figures/full_fig_p032_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of data-splitting ratio on coverage, interval length, and held-out log-loss. [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of signal strength on CI quality and predictive performance. [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗

read the original abstract

Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely unaddressed. This paper focuses on classification trees and makes two key contributions. First, we introduce a novel tree-fitting method that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach. Each split in our approach is selected according to sampling probabilities defined by an exponential mechanism, with a temperature parameter controlling its deviation from the deterministic choice given data. Second, while our approach can fit a tree that with high probability coincides with the fit produced by standard tree algorithms at low temperatures, it is not merely predictive; unlike standard algorithms, it enables inference by taking into account the highly adaptive tree structure. Our method produces pivots directly from the sampling probabilities in the exponential mechanism. In theory, our pivots allow asymptotically valid inference on the parameters in the predictive fit, and in practice, our method delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper randomizes classification tree splits with the exponential mechanism and builds inference pivots from those probabilities to claim asymptotic validity without data splitting.

read the letter

The main point is that this work replaces greedy splitting in classification trees with a probabilistic choice via the exponential mechanism, then constructs pivots directly from the resulting sampling probabilities to support asymptotically valid inference on the parameters in the fitted tree. At low temperatures the selected tree matches the standard greedy one with high probability, so predictive accuracy stays close while the randomization supplies a distribution over possible trees for inference purposes. That combination looks new; most tree literature stays with deterministic or ensemble methods, and exponential mechanism work has mostly stayed in privacy settings rather than post-selection inference for adaptive models. The approach does a reasonable job of addressing the accuracy-inference tradeoff that usually forces data splitting or simpler models. The abstract indicates they have both a theoretical argument for validity and some practical evidence that the method keeps power without the usual cost. The construction is direct and avoids extra tuning beyond the temperature parameter. The soft spot is the sequential character of tree growth. Each split changes the candidate set for the next node, so the probability of any complete tree is not simply the product of the per-split exponential probabilities. If the pivot formula does not integrate over the full path or correctly recover the joint distribution, the claimed uniform pivot under the data-generating process may not hold, which would undermine the asymptotic validity. The stress-test note flags exactly this, and it needs checking in the derivations. The paper is aimed at statisticians and machine-learning researchers who want valid inference for interpretable nonlinear models like trees in settings where data splitting is costly. Readers working on post-selection inference or adaptive procedures would get the most out of it. It deserves a serious referee because the problem is substantive and the proposed fix is distinct enough to warrant detailed review. I would recommend sending it out for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes replacing greedy splitting in classification trees with a probabilistic selection mechanism based on the exponential mechanism, where each split is drawn according to sampling probabilities controlled by a temperature parameter. It claims that pivots constructed directly from these per-split sampling probabilities yield asymptotically valid inference for parameters in the final predictive fit, while preserving predictive accuracy close to standard trees and outperforming data-splitting methods.

Significance. A rigorously justified method for valid post-selection inference in adaptive trees would be a useful contribution to statistical methodology, particularly if it avoids the efficiency loss of sample splitting. The approach leverages an external primitive (the exponential mechanism) to enable inference without post-hoc adjustments, which is a strength if the joint selection probability is correctly recovered.

major comments (2)

[Theoretical section on pivot construction] The central validity claim rests on the pivot having the correct distribution under the data-generating process. Because splits are chosen sequentially and the candidate set at each node is determined by prior partitions, the joint probability of a complete tree is not in general the product of the per-split exponential probabilities. The manuscript should provide an explicit derivation showing how the full joint is recovered or approximated in the pivot formula (see the theoretical section on pivot construction).
[Abstract and simulation studies] The abstract asserts asymptotic validity and practical gains, yet the provided description contains no error bounds, explicit conditions for the temperature parameter, or simulation evidence on coverage. If the joint-probability issue is not resolved, the asymptotic claim is load-bearing and requires a concrete proof sketch or counter-example analysis.

minor comments (2)

Specify the exact utility function inside the exponential mechanism and how the temperature is selected or tuned in the reported experiments.
Clarify whether the method coincides with standard CART at low temperatures only in expectation or with high probability, and provide the relevant probability bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments raise important points about the theoretical justification for the pivots and the need for additional empirical and analytic details. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: [Theoretical section on pivot construction] The central validity claim rests on the pivot having the correct distribution under the data-generating process. Because splits are chosen sequentially and the candidate set at each node is determined by prior partitions, the joint probability of a complete tree is not in general the product of the per-split exponential probabilities. The manuscript should provide an explicit derivation showing how the full joint is recovered or approximated in the pivot formula (see the theoretical section on pivot construction).

Authors: We appreciate this observation. In the proposed method the exponential mechanism is applied at each node conditionally on the partition induced by all prior splits; the candidate set at a given node is therefore a deterministic function of the history. The probability of any complete tree is consequently the product of these successive conditional selection probabilities. The pivot is formed directly from this product, which equals the joint probability of the observed tree under the sampling process. We will insert an explicit recursive derivation in the theoretical section that defines the node-specific candidate sets and shows that the joint is recovered exactly by the product of the per-split probabilities. revision: yes
Referee: [Abstract and simulation studies] The abstract asserts asymptotic validity and practical gains, yet the provided description contains no error bounds, explicit conditions for the temperature parameter, or simulation evidence on coverage. If the joint-probability issue is not resolved, the asymptotic claim is load-bearing and requires a concrete proof sketch or counter-example analysis.

Authors: We agree that the abstract and simulation section would benefit from greater precision. Once the joint-probability derivation is added, the asymptotic claim rests on standard large-sample arguments for the exponential mechanism under a temperature schedule that approaches zero at an appropriate rate. We will augment the theoretical section with a brief proof sketch that states the required conditions on the temperature parameter and supplies explicit error bounds derived from the mechanism's concentration properties. In addition, we will expand the simulation studies to include empirical coverage probabilities for the proposed pivots across a range of sample sizes, tree depths, and temperature values, together with comparisons to data-splitting baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: pivots derived from external exponential mechanism primitive

full rationale

The paper replaces greedy splitting with sampling from an exponential mechanism (a standard tool from differential privacy, independent of this work) and constructs pivots directly from the resulting per-split sampling probabilities. The mechanism itself is not defined in terms of the final tree or its parameters; it is an external primitive whose properties are invoked to handle adaptivity. No equations reduce the pivot construction to a tautological re-expression of the fitted tree, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness or ansatz is imported via self-citation. The derivation therefore remains self-contained against external benchmarks rather than circular by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the exponential mechanism supplying usable sampling probabilities and on the assumption that these probabilities yield asymptotically valid pivots for an adaptively grown tree; no free parameters beyond the user-chosen temperature are mentioned, and no new entities are postulated.

free parameters (1)

temperature parameter
Controls how closely the probabilistic splits match the deterministic greedy choice; its value is chosen by the user and affects both tree structure and the resulting pivots.

axioms (1)

domain assumption Sampling probabilities from the exponential mechanism can be directly converted into asymptotically valid pivots that account for the full adaptivity of the tree-growing process.
Invoked when the abstract states that pivots are produced from the sampling probabilities and allow asymptotically valid inference.

pith-pipeline@v0.9.0 · 5482 in / 1339 out tokens · 43198 ms · 2026-05-17T21:21:51.414738+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sampling probabilities defined by an exponential mechanism... Py(s;P) = exp(ϵP GP(s;y)) / sum exp(ϵP GP(s';y))
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Pivot(bTn,1; Πn,1) constructed from λh(u, bTn,−1) and normal density

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Holistic Evaluation of Language Models

forthcoming. Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Ya- sunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.arXiv preprint arXiv:2211.09110, 2022. 36 J. W. Lindeberg. Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeit- srechnung.Mathemat...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf01194075 2022
[2]

The gain functionsG k h(v)and their first- to third-order derivatives order∇G k h(v),∇2Gk h(v), ∇3Gk h(v)are bounded for allv∈ eDn

work page
[3]

The functionf(.), derived from the logarithm of the sampling probabilities based on the gain functions, isL-Lipschitz, i.e.|f(v 1)−f(v 2)| ≤L||v 1 −v 2||for allv 1, v2 ∈ eDn

work page
[4]

49 Firstly, note that the absolute value of each component ofη k h ∈R 8 is uniformly bounded with respect ton

Furthermore,||∇f(v)|| ≤b 1,||∇ 2f(v)|| ≤b 2,||∇ 3f(v)|| ≤b 3 for allv∈ eDn Proof.Part 1.RecallG k h (v) :=g ηk h v1, vh −1, vh,k −1 where gη (x, y, z) :=I(η 1x+η 2y)−η 8I(η 3x+η 4z)−(1−η 8)I(η 5x+η 6y+η 8z). 49 Firstly, note that the absolute value of each component ofη k h ∈R 8 is uniformly bounded with respect ton. Under Assumption 4, the functionI(·) i...

work page
[5]

Now, since eachb i,n is uni- 55 formly bounded, i.e., sup n,i |bi,n| ≤ ¯C+ (by Proposition B.1), and the uniform multivariate Berry–Esseen bound in Bentkus [2003] applies, it follows that sup A∈C¯k P(ζn ∈A)−P(Z n ∈A) ≲ ¯k1/4 E∥bi,neYi,n∥3 2√n =O(n −1/2), where the supremum is taken over convex sets inR ¯k, denotedC ¯k. BecauseD n is convex andP(ζ n ∈D n) ...

work page 2003
[6]

Hence, for all sufficiently largen, E exp −LB∥Z n∥ 1 Dn(Zn) ≥e −LB(2 √¯k) 3 4 − 1 4 = 1 2 e−2LB √¯k =:C − >0, whereC − does not depend onn. 56

work page

[1] [1]

Holistic Evaluation of Language Models

forthcoming. Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Ya- sunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.arXiv preprint arXiv:2211.09110, 2022. 36 J. W. Lindeberg. Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeit- srechnung.Mathemat...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf01194075 2022

[2] [2]

The gain functionsG k h(v)and their first- to third-order derivatives order∇G k h(v),∇2Gk h(v), ∇3Gk h(v)are bounded for allv∈ eDn

work page

[3] [3]

The functionf(.), derived from the logarithm of the sampling probabilities based on the gain functions, isL-Lipschitz, i.e.|f(v 1)−f(v 2)| ≤L||v 1 −v 2||for allv 1, v2 ∈ eDn

work page

[4] [4]

49 Firstly, note that the absolute value of each component ofη k h ∈R 8 is uniformly bounded with respect ton

Furthermore,||∇f(v)|| ≤b 1,||∇ 2f(v)|| ≤b 2,||∇ 3f(v)|| ≤b 3 for allv∈ eDn Proof.Part 1.RecallG k h (v) :=g ηk h v1, vh −1, vh,k −1 where gη (x, y, z) :=I(η 1x+η 2y)−η 8I(η 3x+η 4z)−(1−η 8)I(η 5x+η 6y+η 8z). 49 Firstly, note that the absolute value of each component ofη k h ∈R 8 is uniformly bounded with respect ton. Under Assumption 4, the functionI(·) i...

work page

[5] [5]

Now, since eachb i,n is uni- 55 formly bounded, i.e., sup n,i |bi,n| ≤ ¯C+ (by Proposition B.1), and the uniform multivariate Berry–Esseen bound in Bentkus [2003] applies, it follows that sup A∈C¯k P(ζn ∈A)−P(Z n ∈A) ≲ ¯k1/4 E∥bi,neYi,n∥3 2√n =O(n −1/2), where the supremum is taken over convex sets inR ¯k, denotedC ¯k. BecauseD n is convex andP(ζ n ∈D n) ...

work page 2003

[6] [6]

Hence, for all sufficiently largen, E exp −LB∥Z n∥ 1 Dn(Zn) ≥e −LB(2 √¯k) 3 4 − 1 4 = 1 2 e−2LB √¯k =:C − >0, whereC − does not depend onn. 56

work page