pith. machine review for the scientific record. sign in

arxiv: 2605.11841 · v1 · submitted 2026-05-12 · 📊 stat.ML · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Minimax Rates and Spectral Distillation for Tree Ensembles

Binh Duc Vu, David S. Watson

Pith reviewed 2026-05-13 05:23 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG
keywords tree ensemblesrandom forestsgradient boostingminimax ratesspectral distillationmodel compressionkernel operator
0
0 comments X

The pith

The eigenvalue decay of the kernel operator controls the minimax rate for random forest regression and supports distillation of tree ensembles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adopts a spectral perspective on tree ensembles such as random forests and gradient boosting machines. It shows that the minimax convergence rate for random forest regression is determined by the decay of eigenvalues in the kernel operator induced by the trees, under mild conditions on their growth. The same viewpoint leads to compression methods that extract leading eigenfunctions or singular vectors to create distilled models far smaller than the originals but with similar predictive power. A reader would care because this gives both theoretical rates for these methods and a practical way to shrink them for limited computing resources.

Core claim

Under mild regularity conditions on tree growth, the eigenvalue decay of the induced kernel operator governs the statistical rate for RF regression. Leading eigenfunctions for RFs and singular vectors for GBMs enable distilled models that are orders of magnitude smaller while maintaining competitive performance.

What carries the argument

The kernel operator induced by the tree ensemble, with its eigenvalue decay governing rates and leading spectral components enabling distillation.

If this is right

  • RF regression achieves minimax optimal statistical rates determined by kernel eigenvalue decay.
  • Distilled models using leading eigenfunctions and singular vectors are orders of magnitude smaller while remaining competitive.
  • The distillation compares favorably to existing forest pruning and rule extraction methods.
  • These compressed models are suitable for resource constrained computing applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This spectral analysis could be applied to understand and improve other ensemble learning algorithms.
  • Designing tree growth procedures to optimize eigenvalue decay might lead to faster converging models.
  • The distilled models may provide a path to more interpretable versions of tree ensembles.

Load-bearing premise

Mild regularity conditions on tree growth that allow the spectral perspective to determine minimax rates and enable the distillation process.

What would settle it

A counterexample where the observed convergence rate of random forest regression does not match the rate predicted by the eigenvalue decay of the induced kernel operator.

Figures

Figures reproduced from arXiv: 2605.11841 by Binh Duc Vu, David S. Watson.

Figure 1
Figure 1. Figure 1: Summary of our pipeline. (a) Start with tabular training data for supervised learning [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Spectral decay, kernel approximation error, and predictive performance across two datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Rate-performance Pareto curves across three benchmark datasets. Shaded regions indicate [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Empirical evaluation of the polynomial decay assumption (C2), across both regression (a) [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Rate-performance Pareto curves across five benchmarkd datasets for three eigenfunction [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Spectral decay, kernel approximation error, and predictive performance across the remaining [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Spectral decay, kernel approximation error, and predictive performance across the remaining [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Rate-performance Pareto curves across the remaining twelve benchmark datasets. Shaded [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization curves for the RF compression benchmark. The solid line represents SCATE, [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization curves for the GBM compression benchmark. The solid line represents [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Training and inference time against model size plots for RFs, over three datasets [PITH_FULL_IMAGE:figures/full_fig_p032_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Training and inference time against model size plots for GBMs, over three datasets [PITH_FULL_IMAGE:figures/full_fig_p033_12.png] view at source ↗
read the original abstract

Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms, with two main contributions. First, we derive minimax-optimal convergence for RF regression, showing that, under mild regularity conditions on tree growth, the eigenvalue decay of the induced kernel operator governs the statistical rate. Second, we exploit this spectral viewpoint to develop compression schemes for tree ensembles. For RFs, leading eigenfunctions of the kernel operator capture the dominant predictive directions; for GBMs, leading singular vectors of the smoother matrix play an analogous role. Learning nonlinear maps for these spectral representations yields distilled models that are orders of magnitude smaller than the originals while maintaining competitive predictive performance. Our methods compare favorably to state of the art algorithms for forest pruning and rule extraction, with applications to resource constrained computing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims to derive minimax-optimal convergence rates for random forest regression by showing that, under mild regularity conditions on tree growth, the eigenvalue decay of an induced kernel operator governs the statistical rate via standard kernel regression bounds. It further develops spectral distillation procedures: leading eigenfunctions of the kernel for RFs and leading singular vectors of the smoother matrix for GBMs are used to fit compact nonlinear maps, yielding models orders of magnitude smaller than the originals while retaining competitive accuracy, with favorable comparisons to existing forest pruning and rule extraction methods.

Significance. If the derivations hold, the work provides a coherent spectral framework that explains the statistical behavior of tree ensembles and supplies practical compression tools with clear applications to resource-constrained settings. Strengths include the direct use of eigenvalue decay to obtain rates without hidden parameter dependence and the internally consistent experimental validation of the distillation step against state-of-the-art baselines.

minor comments (3)
  1. The regularity conditions on tree depth, leaf size, and splitting (stated in the theory sections) are described as mild and standard, yet the experimental section would benefit from a brief verification that the implemented RF and GBM hyperparameters satisfy them on the reported datasets.
  2. Notation for the induced kernel operator and the smoother matrix should be introduced with a short comparison table early in the manuscript to distinguish them from classical kernel methods and to aid readers unfamiliar with the spectral perspective.
  3. In the distillation experiments, the reported model-size reductions and accuracy metrics would be clearer if accompanied by explicit baseline sizes (original ensemble vs. distilled) in a single summary table rather than scattered across figures.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the paper's contributions, and recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines an induced kernel operator from the tree ensemble under explicitly stated regularity conditions on tree growth, derives its eigenvalue decay, and links this decay to minimax rates via standard kernel regression bounds (not by fitting parameters to the target quantity or by self-referential definition). The distillation procedure extracts leading spectral components and fits a separate nonlinear map, with no reduction of the central claims to inputs by construction. No load-bearing self-citations or ansatz smuggling are present; the argument relies on external kernel theory and verifiable conditions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on unspecified mild regularity conditions for tree growth and the validity of viewing tree ensembles as kernel or smoother operators; no free parameters or invented entities are identifiable from the abstract.

axioms (1)
  • domain assumption Mild regularity conditions on tree growth
    Invoked to ensure the eigenvalue decay of the induced kernel governs the statistical rate.

pith-pipeline@v0.9.0 · 5456 in / 1188 out tokens · 65600 ms · 2026-05-13T05:23:31.953954+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · 1 internal anchor

  1. [1]

    Splitting stump forests: tree ensemble compression for edge devices

    Fouad Alkhoury and Pascal Welke. Splitting stump forests: tree ensemble compression for edge devices. InInternational Conference on Discovery Science, pages 3–18. Springer, 2024

  2. [2]

    Generalized random forests.Ann

    Susan Athey, Julie Tibshirani, and Stefan Wager. Generalized random forests.Ann. Statist., 47(2): 1148–1178, 2019

  3. [3]

    Bartlett, Peter J

    Peter L. Bartlett, Peter J. Bickel, Peter Bühlmann, Yoav Freund, Jerome Friedman, Trevor Hastie, Wenxin Jiang, Michael J. Jordan, Vladimir Koltchinskii, Gábor Lugosi, Jon D. McAuliffe, Ya’acov Ritov, Sa- haran Rosset, Robert E. Schapire, Robert Tibshirani, Nicolas Vayatis, Bin Yu, Tong Zhang, and Ji Zhu. Discussions of boosting papers, and rejoinders.The ...

  4. [4]

    Interpretable random forests via rule extraction

    Clément Bénard, Gérard Biau, Sébastien da Veiga, and Erwan Scornet. Interpretable random forests via rule extraction. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 937–945, 2021

  5. [5]

    SIRUS: Stable and Interpretable RUle Set for classification.Electronic Journal of Statistics, 15(1):427 – 505, 2021

    Clément Bénard, Gérard Biau, Sébastien Da Veiga, and Erwan Scornet. SIRUS: Stable and Interpretable RUle Set for classification.Electronic Journal of Statistics, 15(1):427 – 505, 2021

  6. [6]

    Representation learning: A review and new perspectives.IEEE Trans

    Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, 2013

  7. [7]

    Analysis of a random forests model.J

    Gérard Biau. Analysis of a random forests model.J. Mach. Learn. Res., 13:1063–1095, 2012

  8. [8]

    A random forest guided tour.TEST, 25(2):197–227, 2016

    Gérard Biau and Erwan Scornet. A random forest guided tour.TEST, 25(2):197–227, 2016

  9. [9]

    Consistency of random forests and other averaging classifiers.J

    Gérard Biau, Luc Devroye, and Gábor Lugosi. Consistency of random forests and other averaging classifiers.J. Mach. Learn. Res., 9(66):2015–2033, 2008

  10. [10]

    On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification.J

    Gérard Biau and Luc Devroye. On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification.J. Multivar. Anal., 101(10): 2499–2518, 2010

  11. [11]

    Kernel regression, minimax rates and effective dimensionality: Beyond the regular case.Analysis and Applications, 18(4):683–696, 2020

    Gilles Blanchard and Nicole Mücke. Kernel regression, minimax rates and effective dimensionality: Beyond the regular case.Analysis and Applications, 18(4):683–696, 2020

  12. [12]

    A unified approach to extract interpretable rules from tree ensembles via integer programming.Comput

    Lorenzo Bonasera and Emilio Carrizosa. A unified approach to extract interpretable rules from tree ensembles via integer programming.Comput. Oper. Res., 185:107283, 2026

  13. [13]

    Machine learning in resource-scarce embedded systems, fpgas, and end-devices: A survey.Electronics, 8(11):1289, 2019

    Sérgio Branco, André G Ferreira, and Jorge Cabral. Machine learning in resource-scarce embedded systems, fpgas, and end-devices: A survey.Electronics, 8(11):1289, 2019

  14. [14]

    Some infinity theory for predictor ensembles

    Leo Breiman. Some infinity theory for predictor ensembles. Technical Report 579, Statistics Department, UC Berkeley, 2000

  15. [15]

    Random forests.Mach

    Leo Breiman. Random forests.Mach. Learn., 45(1):1–33, 2001

  16. [16]

    Consistency for a simple model of random forests

    Leo Breiman. Consistency for a simple model of random forests. Technical Report 670, Statistics Department, UC Berkeley, Berkeley, CA, 2004

  17. [17]

    Leo Breiman, Jerome Friedman, C. J. Stone, and R. A. Olshen.Classification and Regression Trees. Taylor & Francis, Boca Raton, FL, 1984

  18. [18]

    Joint leaf-refinement and ensemble pruning through l1 regularization.Data Min

    Sebastian Buschjäger and Katharina Morik. Joint leaf-refinement and ensemble pruning through l1 regularization.Data Min. Knowl. Discov., 37(3):1230–1261, 2023

  19. [19]

    Boosting with the l2 loss.Journal of the American Statistical Association, 98 (462):324–339, 2003

    Peter Bühlmann and Bin Yu. Boosting with the l2 loss.Journal of the American Statistical Association, 98 (462):324–339, 2003

  20. [20]

    Caponnetto and E

    A. Caponnetto and E. De Vito. Optimal rates for the regularized least-squares algorithm.Foundations of Computational Mathematics, 7(3):331–368, 2007

  21. [21]

    Multitask learning.Machine Learning, 28(1):41–75, 1997

    Rich Caruana. Multitask learning.Machine Learning, 28(1):41–75, 1997

  22. [22]

    A U-turn on double descent: Rethinking parameter counting in statistical learning

    Alicia Curth, Alan Jeffares, and Mihaela van der Schaar. A U-turn on double descent: Rethinking parameter counting in statistical learning. In37th Conference on Neural Information Processing Systems, 2023

  23. [23]

    The random forest kernel and other kernels for big data from random partitions.arXivpreprint, 1402.4293, 2014

    Alex Davies and Zoubin Ghahramani. The random forest kernel and other kernels for big data from random partitions.arXivpreprint, 1402.4293, 2014. 10

  24. [24]

    Neuralef: Deconstructing kernels by deep neural networks

    Zh ijie Deng, Jiaxin Shi, and Jun Zhu. Neuralef: Deconstructing kernels by deep neural networks. In International Conference on Machine Learning, pages 4976–4992. PMLR, 2022

  25. [25]

    Compressing tree ensembles through level-wise optimization and pruning

    Laurens Devos, Timo Martens, Deniz Can Oruc, Wannes Meert, Hendrik Blockeel, and Jesse Davis. Compressing tree ensembles through level-wise optimization and pruning. InProceedings of the 42nd International Conference on Machine Learning, pages 13419–13443, 2025

  26. [26]

    Springer- Verlag, New York, 1996

    Luc Devroye, László Györfi, and Gábor Lugosi.A Probabilistic Theory of Pattern Recognition. Springer- Verlag, New York, 1996

  27. [27]

    UCI machine learning repository, 2019

    Dheeru Dua and Casey Graff. UCI machine learning repository, 2019. URL http://archive.ics.uci. edu/ml

  28. [28]

    The approximation of one matrix by another of lower rank.Psychometrika, 1 (3):211–218, 1936

    Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank.Psychometrika, 1 (3):211–218, 1936

  29. [29]

    Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Networks, 107:3–11, 2018

    Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Networks, 107:3–11, 2018

  30. [30]

    Friedman and Bogdan E

    Jerome H. Friedman and Bogdan E. Popescu. Predictive learning via rule ensembles.The Annals of Applied Statistics, 2(3):916 – 954, 2008

  31. [31]

    Distilling a neural network into a soft decision tree.arXivpreprint, 1711.09784, 2017

    Nicholas Frosst and Geoffrey Hinton. Distilling a neural network into a soft decision tree.arXivpreprint, 1711.09784, 2017

  32. [32]

    Towards convergence rate analysis of random forests for classification

    Wei Gao and Zhi-Hua Zhou. Towards convergence rate analysis of random forests for classification. In Advances in Neural Information Processing Systems, volume 33, 2020

  33. [33]

    Towards convergence rate analysis of random forests for classification

    Wei Gao, Fan Xu, and Zhi-Hua Zhou. Towards convergence rate analysis of random forests for classification. Artificial Intelligence, 313:103788, 2022

  34. [34]

    Variance reduction in purely random forests.J

    Robin Genuer. Variance reduction in purely random forests.J. Nonparametr. Stat., 24(3):543–562, 2012

  35. [35]

    Why do tree-based models still outperform deep learning on typical tabular data? In36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022

    Leo Grinsztajn, Edouard Oyallon, and Gael Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? In36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022

  36. [36]

    The MIT Press, Cambridge, MA, 2007

    Peter Grünwald.The minimum description length principle. The MIT Press, Cambridge, MA, 2007

  37. [37]

    Cambridge University Press, Cambridge, 2022

    Venkatesan Guruswami, Atri Rudra, and Madhu Sudan.Essential Coding Theory. Cambridge University Press, Cambridge, 2022

  38. [38]

    Springer-Verlag, New York, 2002

    László Györfi, Michael Kohler, Adam Krzy˙zak, and Harro Walk.A Distribution-Free Theory of Nonpara- metric Regression. Springer-Verlag, New York, 2002

  39. [39]

    Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.SIAM Review, 53(2):217–288, 2011

  40. [40]

    Chapman & Hall, London, 1990

    Trevor Hastie and Robert Tibshirani.Generalized additive models. Chapman & Hall, London, 1990

  41. [41]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint, 1503.02531, 2015

  42. [42]

    Hutter.Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability

    M. Hutter.Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer Berlin Heidelberg, Berlin, 2004

  43. [43]

    Stuckey, and Joao Marques-Silva

    Alexey Ignatiev, Yacine Izza, Peter J. Stuckey, and Joao Marques-Silva. Using MaxSAT for efficient explanations of tree ensembles. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3776–3785, 2022

  44. [44]

    On explaining random forests with sat

    Yacine Izza and Joao Marques-Silva. On explaining random forests with sat. InProceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 2584–2591, 2021

  45. [45]

    Sharp analysis of a simple model for random forests

    Jason Klusowski. Sharp analysis of a simple model for random forests. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 757–765, 2021

  46. [46]

    Klusowski and Peter M

    Jason M. Klusowski and Peter M. Tian. Large scale prediction with decision trees.Journal of the American Statistical Association, 119(545):525–537, 2024. 11

  47. [47]

    Mondrian forests: Efficient online random forests

    Balaji Lakshminarayanan, Daniel M Roy, and Yee Whye Teh. Mondrian forests: Efficient online random forests. InAdvances in Neural Information Processing Systems, volume 27, 2014

  48. [48]

    Efficient gradient boosting for prognostic biomarker discovery.Bioin- formatics, 38(6):1631–1638, 2022

    Kaiqiao Li, Sijie Yao, Zhenyu Zhang, Biwei Cao, Christopher M Wilson, Denise Kalos, Pei Fen Kuan, Ruoqing Zhu, and Xuefeng Wang. Efficient gradient boosting for prognostic biomarker discovery.Bioin- formatics, 38(6):1631–1638, 2022

  49. [49]

    Random forests and adaptive nearest neighbors.J

    Yi Lin and Yongho Jeon. Random forests and adaptive nearest neighbors.J. Am. Stat. Assoc., 101(474): 578–590, 2006

  50. [50]

    Forestprune: Compact depth-pruned tree ensembles

    Brian Liu and Rahul Mazumder. Forestprune: Compact depth-pruned tree ensembles. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 9417–9428, 2023

  51. [51]

    Fire: An optimization approach for fast interpretable rule extraction

    Brian Liu and Rahul Mazumder. Fire: An optimization approach for fast interpretable rule extraction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1396–1405, 2023

  52. [52]

    Extracting interpretable models from tree ensembles: Computational and statistical perspectives.Journal of the American Statistical Association, 0:1–25, 2026

    Brian Liu, Rahul Mazumder, and Peter Radchenko. Extracting interpretable models from tree ensembles: Computational and statistical perspectives.Journal of the American Statistical Association, 0:1–25, 2026

  53. [53]

    Lorenzen, Christian Igel, and Yevgeny Seldin

    Stephan S. Lorenzen, Christian Igel, and Yevgeny Seldin. On pac-bayesian bounds for random forests. Machine Learning, 108(8):1503–1522, 2019

  54. [54]

    Quantile regression forests.J

    Nicolai Meinshausen. Quantile regression forests.J. Mach. Learn. Res., 7:983–999, 2006

  55. [55]

    Node harvest.The Annals of Applied Statistics, 4(4):2049–2072, 2010

    Nicolai Meinshausen. Node harvest.The Annals of Applied Statistics, 4(4):2049–2072, 2010

  56. [56]

    Universal consistency and minimax rates for online mondrian forests

    Jaouad Mourtada, Stéphane Gaïffas, and Erwan Scornet. Universal consistency and minimax rates for online mondrian forests. InAdvances in Neural Information Processing Systems, volume 30, 2017

  57. [57]

    Minimax optimal rates for Mondrian trees and forests.The Annals of Statistics, 48(4):2253–2276, 2020

    Jaouad Mourtada, Stéphane Gaïffas, and Erwan Scornet. Minimax optimal rates for Mondrian trees and forests.The Annals of Statistics, 48(4):2253–2276, 2020

  58. [58]

    Minimax rates for high-dimensional random tessellation forests

    Eliza O’Reilly and Ngoc Mai Tran. Minimax rates for high-dimensional random tessellation forests. Journal of Machine Learning Research, 25(89):1–32, 2024

  59. [59]

    Spectral inference networks: Unifying deep and spectral learning.arXiv preprint arXiv:1806.02215, 2018

    David Pfau, Stig Petersen, Ashish Agarwal, David GT Barrett, and Kimberly L Stachenfeld. Spectral inference networks: Unifying deep and spectral learning.arXiv preprint arXiv:1806.02215, 2018

  60. [60]

    Inverse learning in hilbert scales.Machine Learning, 112(7): 2469–2499, 2023

    Abhishake Rastogi and Peter Mathé. Inverse learning in hilbert scales.Machine Learning, 112(7): 2469–2499, 2023

  61. [61]

    World Scientific, Singapore, 1989

    Jorma Rissanen.Stochastic complexity in statistical inquiry. World Scientific, Singapore, 1989

  62. [62]

    Operator svd with neural networks via nested low-rank approximation.arXiv preprint arXiv:2402.03655, 2024

    J Jon Ryu, Xiangxiang Xu, HS Erol, Yuheng Bu, Lizhong Zheng, and Gregory W Wornell. Operator svd with neural networks via nested low-rank approximation.arXiv preprint arXiv:2402.03655, 2024

  63. [63]

    Schapire and Yoav Freund.Boosting: Foundations and Algorithms

    Robert E. Schapire and Yoav Freund.Boosting: Foundations and Algorithms. The MIT Press, Cambridge, MA, 2012

  64. [64]

    On the asymptotics of random forests.J

    Erwan Scornet. On the asymptotics of random forests.J. Multivar. Anal., 146:72–83, 2016

  65. [65]

    Random forests and kernel methods.IEEE Trans

    Erwan Scornet. Random forests and kernel methods.IEEE Trans. Inf. Theory, 62(3):1485–1500, 2016

  66. [66]

    Theory of Random Forests: A Review

    Erwan Scornet and Giles Hooker. Theory of Random Forests: A Review. Technical Report, HAL, 2025

  67. [67]

    Consistency of random forests.Ann

    Erwan Scornet, Gerard Biau, and Jean Philippe Vert. Consistency of random forests.Ann. Statist., 43(4): 1716–1741, 2015

  68. [68]

    Machine learning-aided engineering services’ cost overruns prediction in high-rise residential building projects: Application of random forest regression

    Shahab Shoar, Nicholas Chileshe, and John David Edwards. Machine learning-aided engineering services’ cost overruns prediction in high-rise residential building projects: Application of random forest regression. Journal of Building Engineering, 50:104102, 2022

  69. [69]

    Tabular data: Deep learning is not all you need.Inf

    Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need.Inf. Fusion, 81: 84–90, 2022

  70. [70]

    Optimal global rates of convergence for nonparametric regression.The Annals of Statistics, 10:1040–1053, 1982

    Charles J Stone. Optimal global rates of convergence for nonparametric regression.The Annals of Statistics, 10:1040–1053, 1982

  71. [71]

    Springer Berlin Heidelberg, Berlin, Heidelberg, 2009

    Grigorios Tsoumakas, Ioannis Partalas, and Ioannis Vlahavas.An Ensemble Pruning Primer, pages 1–13. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. 12

  72. [72]

    Springer, New York, 2008

    Alexandre Tsybakov.Introduction to Nonparametric Estimation. Springer, New York, 2008

  73. [73]

    van Rijn, Bernd Bischl, and Luis Torgo

    Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: networked science in machine learning.ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2013

  74. [74]

    Born-again tree ensembles

    Thibaut Vidal and Maximilian Schiffer. Born-again tree ensembles. InInternational conference on machine learning, pages 9743–9753. PMLR, 2020

  75. [75]

    Autoencoding random forests

    Binh Vu, Jan Kapar, Marvin Wright, and David Watson. Autoencoding random forests. 2025

  76. [76]

    Estimation and inference of heterogeneous treatment effects using random forests.J

    Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.J. Am. Stat. Assoc., 113(523):1228–1242, 2018

  77. [77]

    Christopher K. I. Williams and Matthias Seeger. Using the nyström method to speed up kernel machines. InAdvances in Neural Information Processing Systems (NeurIPS), 2001

  78. [78]

    Microforest: Lightweight bottleneck prediction for manufacturing processes on edge devices.Applied Sciences, 15(14):7798, 2025

    Seungmin Yoo and Chanyoung Oh. Microforest: Lightweight bottleneck prediction for manufacturing processes on edge devices.Applied Sciences, 15(14):7798, 2025

  79. [79]

    Minimax rates for learning kernels in operators.arXivpreprint, 2502.20368, 2025

    Sichong Zhang, Xiong Wang, and Fei Lu. Minimax rates for learning kernels in operators.arXivpreprint, 2502.20368, 2025

  80. [80]

    Interpreting models via single tree approximation.arXiv preprint arXiv:1610.09036, 2016

    Yichen Zhou and Giles Hooker. Interpreting models via single tree approximation.arXiv preprint arXiv:1610.09036, 2016

Showing first 80 references.