pith. machine review for the scientific record. sign in

arxiv: 2605.06528 · v1 · submitted 2026-05-07 · 📊 stat.CO

Recognition: unknown

QUBO-Based Calibration for Regression Trees

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:08 UTC · model grok-4.3

classification 📊 stat.CO
keywords QUBOregression treesCARTDinkelbach algorithmsplit selectioncombinatorial optimizationmachine learningquadratic unconstrained binary optimization
0
0 comments X

The pith

Reformulating split selection as a QUBO problem lets solvers find higher-quality splits for regression trees than CART's greedy heuristics while keeping comparable accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the split-selection problem for categorical predictors in regression trees is a combinatorial fractional optimization task that standard CART solves with heuristics. Applying Dinkelbach's algorithm converts this into a sequence of QUBO problems that modern solvers can handle directly. The resulting trees match the predictive performance of CART trees but select splits of measurably higher quality. A reader would care because suboptimal splits from greedy methods can propagate through the tree and degrade overall model quality on data with many categorical features.

Core claim

The central claim is that the fractional combinatorial optimization problem arising in CART split selection for categorical variables can be exactly transformed via Dinkelbach's algorithm into a QUBO formulation. When this QUBO is solved with current solvers, the constructed regression trees achieve out-of-sample performance comparable to standard CART while returning split solutions that are superior in quality to those produced by the usual greedy heuristics.

What carries the argument

Dinkelbach's algorithm, which iteratively replaces the fractional objective of split selection with a sequence of QUBO problems whose solutions converge to the optimum of the original problem.

Load-bearing premise

Dinkelbach's reformulation preserves the exact optimal solution of the original fractional split-selection objective and current QUBO solvers reliably return higher-quality splits than CART's greedy heuristics.

What would settle it

Run both methods on the same collection of datasets with high-cardinality categorical predictors, measure the objective value achieved at each split, and check whether the QUBO solutions consistently yield strictly better objective values while test-set performance stays within statistical noise of CART.

Figures

Figures reproduced from arXiv: 2605.06528 by Anne MacKay, Iro Ren\'e Kouarfate, Mathieu Pigeon, Maxime Dion.

Figure 1
Figure 1. Figure 1: Branch of the depth-6 regression tree used for prediction view at source ↗
read the original abstract

Tree-based regression models are widely used in supervised learning, with the Classification and Regression Tree (CART) algorithm serving as a standard reference. CART construction involves solving a sequence of split-selection optimization problems. For categorical predictors, this problem can be formulated as a combinatorial fractional optimization problem. This structure makes the exact optimization computationally challenging and leads to standard implementations that rely on greedy heuristics, which may result in suboptimal splits. In this work, we reformulate this fractional problem and apply Dinkelbach (1967) algorithm to convert it into a Quadratic Unconstrained Binary Optimization (QUBO) problem. Using state-of-the-art QUBO solvers, we obtain QUBO-based regression trees with predictive performance comparable to standard CART while yielding higher-quality split solutions. These results highlight the potential of QUBO formulations for improving tree-based learning methods and open perspectives for future hybrid classical--quantum implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to reformulate the split-selection optimization problem for categorical predictors in regression trees as a combinatorial fractional optimization problem, which is then converted into a QUBO using Dinkelbach's algorithm. Using state-of-the-art QUBO solvers, the resulting trees are claimed to have predictive performance comparable to standard CART while providing higher-quality splits.

Significance. If the empirical claims hold, this work would be significant as it introduces a novel optimization-based approach to improve upon the greedy heuristics in CART for handling categorical features, potentially leading to better tree structures. It also suggests future directions for hybrid classical-quantum implementations in machine learning.

major comments (2)
  1. Abstract: The assertions regarding comparable predictive performance and higher-quality split solutions lack any quantitative support, such as metrics on specific datasets, comparisons to baselines, or analysis of split quality. This is critical as the central contribution rests on these unverified claims.
  2. Dinkelbach reformulation: The manuscript does not discuss the requirement for exact global optima in each iteration of Dinkelbach's algorithm to preserve the original fractional objective. Since QUBO solvers are heuristic and may not guarantee optimality, especially for large category cardinalities, the claimed superiority of splits may not materialize.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and the opportunity to improve the manuscript. We provide point-by-point responses to the major comments below, indicating revisions where appropriate.

read point-by-point responses
  1. Referee: Abstract: The assertions regarding comparable predictive performance and higher-quality split solutions lack any quantitative support, such as metrics on specific datasets, comparisons to baselines, or analysis of split quality. This is critical as the central contribution rests on these unverified claims.

    Authors: The abstract provides a high-level overview, while the full manuscript contains detailed experimental results on multiple datasets, including quantitative comparisons of predictive performance (e.g., MSE values) against standard CART and metrics for split quality based on the objective function. To address the concern, we will revise the abstract to incorporate brief quantitative highlights drawn from the experimental section, such as average performance metrics and improvements in split quality. revision: yes

  2. Referee: Dinkelbach reformulation: The manuscript does not discuss the requirement for exact global optima in each iteration of Dinkelbach's algorithm to preserve the original fractional objective. Since QUBO solvers are heuristic and may not guarantee optimality, especially for large category cardinalities, the claimed superiority of splits may not materialize.

    Authors: We appreciate this observation. Dinkelbach's algorithm converges to the global optimum of the fractional program when each subproblem is solved exactly. Our approach employs heuristic QUBO solvers, which may return approximate solutions. In the revised manuscript, we have added a dedicated paragraph in the methods section discussing this aspect, including notes on the theoretical implications of approximation and empirical evidence from our experiments showing that the obtained splits remain higher-quality than those from the greedy CART heuristic, as evaluated by the objective value, even for the tested category cardinalities. revision: yes

Circularity Check

0 steps flagged

No circularity: direct mathematical reformulation using external algorithm

full rationale

The paper's core step is reformulating the categorical split-selection problem (a combinatorial fractional program) via Dinkelbach's 1967 algorithm into an equivalent QUBO, then invoking external QUBO solvers. This transformation is a standard parametric reduction whose correctness is independent of any fitted parameters, self-citations, or redefinitions within the paper. No claimed result (split quality or tree performance) is shown to equal its own inputs by construction; the equivalence follows from the cited external theorem rather than from any internal loop or renaming. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the applicability of Dinkelbach's algorithm to the fractional program arising from CART split selection; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Dinkelbach's algorithm converts the fractional optimization problem for categorical splits into an equivalent QUBO
    Invoked in the abstract as the key conversion step.

pith-pipeline@v0.9.0 · 5452 in / 1171 out tokens · 29775 ms · 2026-05-08T03:08:59.556854+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references

  1. [1]

    Quantum computing based hybrid solution strategies for large-scale discrete-continuous optimization problems

    Akshay Ajagekar, Travis Humble, and Fengqi You. Quantum computing based hybrid solution strategies for large-scale discrete-continuous optimization problems. Computers & Chemical Engineering, 132: 0 106630, 2020

  2. [2]

    Aoki and M

    T. Aoki and M. Ohzeki. A qubo formulation of the k-medoids problem. Scientific Reports, 2021

  3. [3]

    Translating constraints into qubos for the quadratic knapsack problem

    Tariq Bontekoe, Frank Phillipson, and Ward van der Schoot. Translating constraints into qubos for the quadratic knapsack problem. In International Conference on Computational Science, pages 90--107. Springer, 2023

  4. [4]

    Pruning decision trees with misclassification costs

    Jeffrey P Bradford, Clayton Kunz, Ron Kohavi, Cliff Brunk, and Carla E Brodley. Pruning decision trees with misclassification costs. In European Conference on Machine Learning, pages 131--136. Springer, 1998

  5. [5]

    Breiman, J

    L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Chapman & Hall, New York, 1984

  6. [6]

    Regression tree credibility model

    Liqun Diao and Chengguo Weng. Regression tree credibility model. North American Actuarial Journal, 23 0 (2): 0 169--196, 2019

  7. [7]

    Dinkelbach

    W. Dinkelbach. On nonlinear fractional programming. Management Science, 13 0 (7): 0 492--498, 1967

  8. [8]

    Quantum bridge analytics i: a tutorial on formulating and using qubo models

    Fred Glover, Gary Kochenberger, Rick Hennig, and Yu Du. Quantum bridge analytics i: a tutorial on formulating and using qubo models. Annals of Operations Research, 314 0 (1): 0 141--183, 2022

  9. [9]

    The elements of statistical learning: data mining, inference, and prediction, volume 2

    Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009

  10. [10]

    An introduction to statistical learning with applications in r, 2013

    Gareth James. An introduction to statistical learning with applications in r, 2013

  11. [11]

    Bagging and regression trees in individual claims reserving: J

    Jan Janou s ek and Michal Pe s ta. Bagging and regression trees in individual claims reserving: J. janou s ek and m. pe s ta. Statistical Papers, 66 0 (4): 0 89, 2025

  12. [12]

    Quantum annealing in the transverse Ising model

    Tadashi Kadowaki and Hidetoshi Nishimori. Quantum annealing in the transverse Ising model. Physical Review E, 58 0 (5): 0 5355, 1998

  13. [13]

    An exploratory technique for investigating large quantities of categorical data

    Gordon V Kass. An exploratory technique for investigating large quantities of categorical data. Journal of the royal statistical society: series c (Applied statistics), 29 0 (2): 0 119--127, 1980

  14. [14]

    The unconstrained binary quadratic programming problem: a survey

    Gary Kochenberger, Jin-Kao Hao, Fred Glover, Mark Lewis, Zhipeng L \"u , Haibo Wang, and Yang Wang. The unconstrained binary quadratic programming problem: a survey. Journal of Combinatorial Optimization, 28 0 (1): 0 58--81, 2014

  15. [15]

    S. B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31: 0 249--268, 2007

  16. [16]

    Fifty years of classification and regression trees

    Wei-Yin Loh. Fifty years of classification and regression trees. International Statistical Review, 82 0 (3): 0 329--348, 2014

  17. [17]

    Tree-based censored regression with applications in insurance

    Olivier Lopez, Xavier Milhaud, and Pierre-E Th \'e rond. Tree-based censored regression with applications in insurance. Electronic Journal of Statistics, 10: 0 2685–2716, 2016

  18. [18]

    Ising formulations of many NP problems

    Andrew Lucas. Ising formulations of many NP problems. Frontiers in Physics, 2: 0 5, 2014

  19. [19]

    Distance-based clustering using qubo formulations

    Nasa Matsumoto, Yohei Hamakawa, Kosuke Tatsumura, and Kazue Kudo. Distance-based clustering using qubo formulations. Scientific reports, 12 0 (1): 0 2669, 2022

  20. [20]

    Otsuka and S

    H. Otsuka and S. Tanaka. A qubo model for gaussian process variance reduction. Quantum Information Processing, 2020

  21. [21]

    Portfolio optimisation using the d-wave quantum annealer

    Frank Phillipson and Harshil Singh Bhatia. Portfolio optimisation using the d-wave quantum annealer. In International Conference on Computational Science, pages 45--59. Springer, 2021

  22. [22]

    J. R. Quinlan. Induction of decision trees. Machine Learning, 1 0 (1): 0 81--106, 1986

  23. [23]

    Ross Quinlan

    J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993

  24. [24]

    Fractional programming

    Siegfried Schaible. Fractional programming. i, duality. Management science, 22 0 (8): 0 858--867, 1976

  25. [25]

    Performance comparison of typical binary-integer encodings in an ising machine

    Kensuke Tamura, Tatsuhiko Shirai, Hosho Katsura, Shu Tanaka, and Nozomu Togawa. Performance comparison of typical binary-integer encodings in an ising machine. IEEE Access, 9: 0 81032--81039, 2021

  26. [26]

    User written splitting functions for rpart

    Terry Therneau et al. User written splitting functions for rpart. Technical report, Technical report, Mayo Clinic, 2019

  27. [27]

    An introduction to recursive partitioning using the rpart routines

    Terry M Therneau, Elizabeth J Atkinson, et al. An introduction to recursive partitioning using the rpart routines. Technical report, Technical report Mayo Foundation, 1997

  28. [28]

    P. M. Xavier, Y. Park, D. Noble, I. Santana, M. Paredes, and D. E. B. Neira. A performance comparison of variable encoding techniques for QUIO and QUBO problems. In Frontiers in Industrial Engineering. Purdue Quantum AI, Communications in Computer and Information Science. Springer, 2026. ISBN 978-981-95782-8-3. In press

  29. [29]

    Qubo decision tree: Annealing machine extends decision tree splitting

    Koichiro Yawata, Yoshihiro Osakabe, Takuya Okuyama, and Akinori Asahara. Qubo decision tree: Annealing machine extends decision tree splitting. In 2022 IEEE international conference on knowledge graph (ICKG), pages 355--364. IEEE, 2022

  30. [30]

    Dinkelbach's algorithm as an efficient method to solve a class of minlp models for large-scale cyclic scheduling problems

    Fengqi You, Pedro M Castro, and Ignacio E Grossmann. Dinkelbach's algorithm as an efficient method to solve a class of minlp models for large-scale cyclic scheduling problems. Computers & Chemical Engineering, 33 0 (11): 0 1879--1889, 2009

  31. [31]

    Globally convergent exact and inexact parametric algorithms for solving large-scale mixed-integer fractional programs and applications in process systems engineering

    Zhixia Zhong and Fengqi You. Globally convergent exact and inexact parametric algorithms for solving large-scale mixed-integer fractional programs and applications in process systems engineering. Computers & Chemical Engineering, 61: 0 90--101, 2014