A Bregman Perspective on Classification and Regression Trees

Mathias Bourel

arxiv: 2606.13984 · v2 · pith:RYU7ZWAKnew · submitted 2026-06-12 · 📊 stat.ML · cs.LG· stat.ME

A Bregman Perspective on Classification and Regression Trees

Mathias Bourel This is my paper

Pith reviewed 2026-06-27 05:07 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords CARTBregman divergenceimpurity measuresexponential familyconvex functionssplit selectiontree consistencyrecursive partitioning

0 comments

The pith

Bregman divergences unify impurity measures, node representatives, and split rules across exponential-family CART models via convex functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard CART impurity criteria such as least-squares, Poisson deviance, and Kullback-Leibler losses all arise as Bregman divergences generated by convex functions. This single representation replaces separate constructions for each statistical model and lets node representatives and split selection be derived from general properties of the generating convex function. The resulting framework yields geometric statements about how the choice of convex function controls impurity reduction and partition stability, plus consistency theorems that apply uniformly to the whole family of procedures.

Core claim

Key ingredients of the CART methodology—including node representatives, impurity measures, and split selection rules—can be expressed and analyzed through general properties of convex functions rather than through separate model-specific constructions. Classical least-squares, Poisson deviance, Kullback-Leibler-type losses, and other impurity measures associated with exponential-family models are placed inside a common Bregman framework, allowing geometric properties of the generating convex function to govern impurity reductions and stability of recursive partitions, and delivering consistency results for the broad family of Bregman-based CART procedures.

What carries the argument

Bregman divergence generated by a convex function, used to represent impurity and to derive node representatives and split selection.

If this is right

Impurity reductions and split selection rules become consequences of convexity and Bregman properties rather than model-specific derivations.
Geometric features of the generating convex function directly control the size of impurity drops and the stability of the resulting partitions.
Consistency theorems apply uniformly to the entire class of Bregman-based CART procedures instead of being proved case by case.
A geometric interpretation of impurity-based tree construction replaces the collection of separate statistical arguments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same convex-function lens could be applied to other recursive partitioning methods that rely on impurity or deviance criteria.
Specific choices of the generating convex function might yield new, previously unexamined impurity measures with controllable stability properties.
Empirical tests could compare partition stability across different convex generators on the same data sets to isolate the geometric effect.

Load-bearing premise

Impurity measures for exponential-family models admit a Bregman divergence representation generated by a convex function.

What would settle it

An explicit impurity criterion used in standard CART that cannot be written as a Bregman divergence for any convex function.

Figures

Figures reproduced from arXiv: 2606.13984 by Mathias Bourel.

**Figure 2.** Figure 2: Convex Legendre conjugate of a convex function [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Bregman Tree fitted under distributional losses unavailable in [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Bregman Tree vs. CART under Poisson (left) and Gaussian (right) data generating pro [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

read the original abstract

Classification and Regression Trees (CART) constitute one of the most influential paradigms in statistical learning. Although a variety of impurity measures have been proposed for different statistical models, these criteria are typically introduced on a case-by-case basis and analyzed separately. In this paper, we study CART through the lens of Bregman divergences. This perspective places the classical least-squares criterion, Poisson deviance, Kullback-Leibler-type losses, and other impurity measures associated with exponential-family models within a common framework. As a result, key ingredients of the CART methodology -- including node representatives, impurity measures, and split selection rules -- can be expressed and analyzed through general properties of convex functions rather than through separate model-specific constructions. Beyond the algorithmic formulation, we investigate theoretical properties of Bregman-based CART procedures. In particular, we analyze how geometric properties of the generating convex function influence impurity reductions and stability of recursive partitions. We also establish consistency results within the proposed framework, providing a unified theoretical treatment for a broad family of CART type procedures. Our results provide a geometric interpretation of impurity-based tree construction and show that many classical CART impurity criteria admit a common interpretation within a Bregman framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bregman unification of CART impurities is a clean but standard convex-analysis move with limited new technical depth.

read the letter

The main thing here is that the paper recasts node representatives, impurity measures, and split rules for exponential-family CART as Bregman divergences generated by the family's cumulant function. Least-squares, Poisson deviance, and KL-type criteria all drop out as special cases. That framing is tidy and lets geometric properties like strong convexity control impurity reduction in one shot.

What the paper does well is collect the classical criteria under one roof and note that consistency arguments can then ride on standard convex-analysis and empirical-process tools instead of case-by-case proofs. If the derivations in the full text are clean, this could save future authors some repetition when they want to analyze a new exponential-family tree.

The soft spot is that the central construction is textbook Bregman divergence applied to the usual CART objective; nothing in the abstract or stress-test suggests a non-standard step or a result that would not follow immediately once the setup is written down. The claimed consistency results therefore look like direct translations rather than fresh technical work. No empirical checks or counter-examples are mentioned, which keeps the contribution mostly organizational.

This is for people already interested in convex-analytic views of tree methods or who need a reference that treats several impurity measures together. A reader working on applied CART or on non-exponential-family extensions will not find much to take away.

I would send it to peer review if the proofs are written out clearly and the consistency statements are stated precisely; the framework is coherent even if the novelty is modest.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a unified Bregman-divergence framework for CART procedures on exponential-family models. Node representatives are defined as Bregman projections (unique minimizers of expected divergence), impurity measures as expected divergences to these points, and split rules as maximizers of impurity reduction; all are generated by the cumulant function of the exponential family. The paper recovers classical criteria (squared error, Poisson deviance, KL) as special cases, analyzes how strong convexity and other geometric properties of the generator control reduction rates and partition stability, and derives consistency results for the resulting tree estimators via convex-analysis and empirical-process arguments.

Significance. If the derivations hold, the work supplies a single geometric lens that replaces separate model-specific constructions for a broad class of impurity measures. This yields immediate corollaries for stability and consistency that previously required case-by-case arguments, and it makes transparent how the curvature of the cumulant function governs the behavior of recursive partitioning. The explicit recovery of standard criteria and the use of off-the-shelf convex-analysis tools constitute a genuine organizational contribution.

minor comments (3)

[§3] §3 (or the section defining the split criterion): the statement that the split rule 'maximizes the reduction in expected Bregman divergence' should explicitly note that this is equivalent to the classical weighted-impurity reduction only after the node-size weighting is inserted; a short derivation would prevent readers from overlooking the weighting step.
[Theorem on consistency] The consistency theorem (presumably Theorem 4 or 5) invokes 'standard empirical-process arguments' but does not list the precise moment or Lipschitz conditions on the generator f that are needed for the uniform law of large numbers; adding a short remark or reference to the relevant corollary in van der Vaart & Wellner would make the result self-contained.
Notation: the generating convex function is sometimes denoted f, sometimes ψ, and occasionally left implicit. A single global symbol together with a reminder that it is the cumulant function would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. The provided summary accurately captures the manuscript's contributions to unifying CART procedures via Bregman divergences for exponential-family models.

Circularity Check

0 steps flagged

No significant circularity; standard Bregman framework applied to CART

full rationale

The derivation applies well-known properties of Bregman divergences (node representative as unique minimizer of expected divergence to a convex function's gradient, impurity as expected divergence, splits maximizing reduction) to exponential-family CART. These recover classical criteria (least-squares from quadratic, Poisson from exp) by direct substitution without parameter fitting or redefinition. Consistency and geometric properties follow from external convex analysis and empirical process theory. No self-citation is load-bearing, no ansatz is smuggled, and no step equates a prediction to its own fitted input by construction. The paper is self-contained against standard mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the framework is described at the level of existing convex-function properties.

pith-pipeline@v0.9.1-grok · 5727 in / 875 out tokens · 24184 ms · 2026-06-27T05:07:36.535974+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Wadsworth and Brooks/Cole , year=

Classification and Regression Trees , author=. Wadsworth and Brooks/Cole , year=
[2]

Journal of Machine Learning Research , volume=

Clustering with Bregman Divergences , author=. Journal of Machine Learning Research , volume=
[3]

A Probabilistic Theory of Pattern Recognition , pages=

Feature Extraction , author=. A Probabilistic Theory of Pattern Recognition , pages=. 1996 , publisher=

1996
[4]

The annals of statistics , pages=

Consistent nonparametric regression , author=. The annals of statistics , pages=. 1977 , publisher=

1977
[5]

IEEE Transactions on Information Theory , volume=

The information geometry of mirror descent , author=. IEEE Transactions on Information Theory , volume=. 2015 , publisher=

2015
[6]

Journal of machine learning research , volume=

A unified theory of diversity in ensemble learning , author=. Journal of machine learning research , volume=
[7]

arXiv preprint arXiv:2302.05833 , year=

Bregman-Wasserstein divergence: geometry and applications , author=. arXiv preprint arXiv:2302.05833 , year=

work page arXiv
[8]

Problem complexity and method efficiency in optimization , year=

Wiley-Interscience series in discrete mathematics , author=. Problem complexity and method efficiency in optimization , year=
[9]

arXiv preprint arXiv:2511.08789 , year=

A generalized bias-variance decomposition for bregman divergences , author=. arXiv preprint arXiv:2511.08789 , year=

work page arXiv
[10]

Re-examination of Bregman functions and new properties of their divergences

Re‑examination of Bregman functions and new properties of their divergences , author=. arXiv preprint arXiv:1803.00641 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Nelder , title =

Peter McCullagh and John A. Nelder , title =
[12]

2022 , publisher=

Exponential families in theory and practice , author=. 2022 , publisher=

2022
[13]

, title =

Bregman, Lev M. , title =. USSR Computational Mathematics and Mathematical Physics , volume =. 1967 , doi =

1967
[14]

Journal of Computational and Graphical Statistics , volume =

Zeileis, Achim and Hothorn, Torsten and Hornik, Kurt , title =. Journal of Computational and Graphical Statistics , volume =
[15]

Journal of Machine Learning Research , volume =

Hothorn, Torsten and Zeileis, Achim , title =. Journal of Machine Learning Research , volume =
[16]

International Journal of Biostatistics , volume =

Seibold, Hannes and Zeileis, Achim and Hothorn, Torsten , title =. International Journal of Biostatistics , volume =
[17]

Journal of Computational and Graphical Statistics , volume =

Schlosser, Lisa and Hothorn, Torsten and Stauffer, Reto and Zeileis, Achim , title =. Journal of Computational and Graphical Statistics , volume =

[1] [1]

Wadsworth and Brooks/Cole , year=

Classification and Regression Trees , author=. Wadsworth and Brooks/Cole , year=

[2] [2]

Journal of Machine Learning Research , volume=

Clustering with Bregman Divergences , author=. Journal of Machine Learning Research , volume=

[3] [3]

A Probabilistic Theory of Pattern Recognition , pages=

Feature Extraction , author=. A Probabilistic Theory of Pattern Recognition , pages=. 1996 , publisher=

1996

[4] [4]

The annals of statistics , pages=

Consistent nonparametric regression , author=. The annals of statistics , pages=. 1977 , publisher=

1977

[5] [5]

IEEE Transactions on Information Theory , volume=

The information geometry of mirror descent , author=. IEEE Transactions on Information Theory , volume=. 2015 , publisher=

2015

[6] [6]

Journal of machine learning research , volume=

A unified theory of diversity in ensemble learning , author=. Journal of machine learning research , volume=

[7] [7]

arXiv preprint arXiv:2302.05833 , year=

Bregman-Wasserstein divergence: geometry and applications , author=. arXiv preprint arXiv:2302.05833 , year=

work page arXiv

[8] [8]

Problem complexity and method efficiency in optimization , year=

Wiley-Interscience series in discrete mathematics , author=. Problem complexity and method efficiency in optimization , year=

[9] [9]

arXiv preprint arXiv:2511.08789 , year=

A generalized bias-variance decomposition for bregman divergences , author=. arXiv preprint arXiv:2511.08789 , year=

work page arXiv

[10] [10]

Re-examination of Bregman functions and new properties of their divergences

Re‑examination of Bregman functions and new properties of their divergences , author=. arXiv preprint arXiv:1803.00641 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Nelder , title =

Peter McCullagh and John A. Nelder , title =

[12] [12]

2022 , publisher=

Exponential families in theory and practice , author=. 2022 , publisher=

2022

[13] [13]

, title =

Bregman, Lev M. , title =. USSR Computational Mathematics and Mathematical Physics , volume =. 1967 , doi =

1967

[14] [14]

Journal of Computational and Graphical Statistics , volume =

Zeileis, Achim and Hothorn, Torsten and Hornik, Kurt , title =. Journal of Computational and Graphical Statistics , volume =

[15] [15]

Journal of Machine Learning Research , volume =

Hothorn, Torsten and Zeileis, Achim , title =. Journal of Machine Learning Research , volume =

[16] [16]

International Journal of Biostatistics , volume =

Seibold, Hannes and Zeileis, Achim and Hothorn, Torsten , title =. International Journal of Biostatistics , volume =

[17] [17]

Journal of Computational and Graphical Statistics , volume =

Schlosser, Lisa and Hothorn, Torsten and Stauffer, Reto and Zeileis, Achim , title =. Journal of Computational and Graphical Statistics , volume =