pith. sign in

arxiv: 2606.25494 · v1 · pith:662QXLMZnew · submitted 2026-06-24 · 🧮 math.PR · math.ST· stat.ML· stat.TH

A functional central limit theorem for kernel gradient flow and infinitesimal gradient boosting

Pith reviewed 2026-06-25 20:21 UTC · model grok-4.3

classification 🧮 math.PR math.STstat.MLstat.TH
keywords functional central limit theoreminfinitesimal gradient boostingkernel gradient flowreproducing kernel Hilbert spaceGaussian processordinary differential equationstochastic perturbation
0
0 comments X

The pith

The rescaled deviations of infinitesimal gradient boosting from its deterministic limit converge in distribution to a Gaussian process in an associated reproducing kernel Hilbert space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves a functional central limit theorem for the fluctuations of kernel gradient flow and infinitesimal gradient boosting around their mean behavior. It models the boosting process as the solution to an autonomous ODE in a reproducing kernel Hilbert space tied to the softmax gradient tree base learner. A general result on stochastic perturbations of ODEs in Banach spaces is used to transfer convergence and central limit theorem properties from the driving vector fields to the ODE solutions. This matters for understanding variability in boosting beyond average performance, first in the simpler kernel flow case with explicit Gaussian characterization and then in the tree-based setting.

Core claim

We establish a functional central limit theorem: the rescaled deviations converge in distribution to a Gaussian process. The analysis is carried out in a reproducing kernel Hilbert space naturally associated with the softmax gradient tree base learner, in which the boosting process is characterized as the solution of an autonomous ordinary differential equation. The proof rests on a general stochastic perturbation analysis of ODEs in Banach spaces, which is of independent interest: whenever a sequence of vector fields converges and satisfies a central limit theorem, so does the associated ODE solution. We first illustrate this perturbation approach in the simpler setting of kernel gradient f

What carries the argument

The autonomous ordinary differential equation in the reproducing kernel Hilbert space associated with the softmax gradient tree base learner, which characterizes the boosting process and transfers central limit behavior via stochastic perturbation analysis.

If this is right

  • In the kernel gradient flow case the limiting Gaussian process admits an explicit characterization.
  • The general perturbation result for ODEs applies to any sequence of vector fields satisfying the stated convergence and CLT conditions.
  • The tree-based boosting fluctuations are captured in the RKHS naturally induced by the softmax gradient tree learner.
  • The functional CLT extends the prior large-sample deterministic analysis of infinitesimal gradient boosting to its stochastic fluctuations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit Gaussian limit in the kernel flow case could be used to derive asymptotic confidence sets for the flow trajectory.
  • The Banach-space ODE perturbation lemma may transfer to other continuous-time machine learning dynamics such as gradient flows on neural networks.
  • Finite-sample diagnostics that check convergence of empirical covariances to the predicted Gaussian covariance operator would provide a practical test of the theorem.

Load-bearing premise

Whenever a sequence of vector fields converges and satisfies a central limit theorem, the associated ODE solution in the Banach space does as well.

What would settle it

Numerical simulations in which the rescaled boosting deviations fail to converge in distribution to the predicted Gaussian process when measured in the RKHS norm associated with the base learner.

read the original abstract

Building on the large-sample analysis of infinitesimal gradient boosting (Dombry and Duchamps, 2024b), we study the fluctuations of the process around its deterministic limit and establish a functional central limit theorem: the rescaled deviations converge in distribution to a Gaussian process. The analysis is carried out in a reproducing kernel Hilbert space (RKHS) naturally associated with the softmax gradient tree base learner, in which the boosting process is characterized as the solution of an autonomous ordinary differential equation (ODE). The proof rests on a general stochastic perturbation analysis of ODEs in Banach spaces, which is of independent interest: whenever a sequence of vector fields converges and satisfies a central limit theorem, so does the associated ODE solution. We first illustrate this perturbation approach in the simpler setting of kernel gradient flow, where the Gaussian limit admits an explicit characterization, and then consider the more complicated tree-based gradient boosting setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper establishes a functional central limit theorem showing that rescaled deviations of the infinitesimal gradient boosting process from its deterministic limit converge in distribution to a Gaussian process. The analysis is performed in the RKHS associated with the softmax gradient tree base learner, where the boosting dynamics are characterized as the solution to an autonomous ODE in a Banach space. The proof relies on a general stochastic perturbation result for ODEs (of independent interest) that transfers a CLT on the driving vector fields to the ODE flows; this is first illustrated explicitly for kernel gradient flow and then applied to the tree-based setting.

Significance. If the required verifications hold, the result supplies the first rigorous fluctuation theory for gradient boosting around its mean-field limit, which is significant for understanding statistical variability in tree ensembles. The general perturbation theorem for ODEs in Banach spaces is a strength of independent interest. The explicit characterization of the Gaussian limit in the kernel gradient flow case is also a clear positive.

major comments (1)
  1. [tree-based gradient boosting setting] Application to the tree-based gradient boosting setting: the general perturbation theorem requires that the vector field induced by the softmax operator on trees satisfies the theorem's hypotheses (e.g., uniform Lipschitz continuity or local Lipschitz plus linear growth in the RKHS norm, together with tightness of the driving noise). These conditions are load-bearing for the tree claim but their verification is not visible from the abstract and must be checked explicitly in the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the significance of our results and for the constructive comment. We respond to the major comment below.

read point-by-point responses
  1. Referee: [tree-based gradient boosting setting] Application to the tree-based gradient boosting setting: the general perturbation theorem requires that the vector field induced by the softmax operator on trees satisfies the theorem's hypotheses (e.g., uniform Lipschitz continuity or local Lipschitz plus linear growth in the RKHS norm, together with tightness of the driving noise). These conditions are load-bearing for the tree claim but their verification is not visible from the abstract and must be checked explicitly in the manuscript.

    Authors: We agree that the hypotheses of the general perturbation theorem must be verified explicitly for the tree-based application, as these are indeed load-bearing. The manuscript carries out these verifications after introducing the RKHS associated with the softmax gradient tree: local Lipschitz continuity with linear growth in the RKHS norm is established in Section 3.2 (via the properties of the softmax operator and the finite-dimensional nature of the tree base learners), while tightness of the driving noise follows from the moment bounds and the functional CLT for the tree learners, as shown in Proposition 4.3 and Appendix C. To address the concern that these steps may not be immediately visible, we will revise the manuscript by inserting a short dedicated paragraph (new Subsection 3.4) immediately after the statement of the main tree-based theorem. This paragraph will list each hypothesis of the general theorem and cite the precise location where it is verified, thereby making the argument fully self-contained without altering the proofs. revision: yes

Circularity Check

0 steps flagged

Self-citation to prior deterministic limit; general perturbation theorem proved independently

full rationale

The derivation applies a general stochastic perturbation theorem for ODEs in Banach spaces (presented as new and of independent interest) to the boosting process characterized as an autonomous ODE in the associated RKHS. This builds on the authors' 2024b paper only for the deterministic large-sample limit; the functional CLT for rescaled deviations is obtained by transferring the CLT on vector fields to the ODE flows without reducing the target Gaussian process to any fitted parameter or self-referential definition within the paper's own equations. No self-definitional, fitted-input, or uniqueness-imported circularity is present.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard results from the theory of ODEs in Banach spaces and functional central limit theorems; no free parameters are fitted and no new entities are postulated.

axioms (2)
  • standard math Standard results on existence, uniqueness, and continuous dependence for ODEs in Banach spaces
    Invoked to justify that the boosting process is the solution of an autonomous ODE and to apply the perturbation analysis.
  • domain assumption Central limit theorem for the driving vector fields in the RKHS
    Required for the general perturbation result to transfer the CLT from the vector fields to the ODE solutions.

pith-pipeline@v0.9.1-grok · 5692 in / 1432 out tokens · 41059 ms · 2026-06-25T20:21:30.115856+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 18 canonical work pages

  1. [1]

    Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society , 68(3):337--404. doi:10.1090/S0002-9947-1950-0051437-7

  2. [2]

    Blanchard, G., Lugosi, G., and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. , 4(5):861--894. doi:10.1162/1532443041424319

  3. [3]

    Breiman, L. (2004). Population theory for boosting ensembles. Ann. Statist. , 32(1):1--11. doi:10.1214/aos/1079120126

  4. [4]

    and Olshen, Richard A

    Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees . Chapman & Hall . doi:10.1201/9781315139470

  5. [5]

    and Yu, B

    B\" u hlmann, P. and Yu, B. (2003). Boosting with the L_2 loss: regression and classification. J. Amer. Statist. Assoc. , 98(462):324--339. doi:10.1198/016214503000125

  6. [6]

    XGBoost: A scalable tree boosting system

    Chen, T. and Guestrin, C. (2016). XGBoost : A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 785--794, San Francisco California USA . ACM . doi:10.1145/2939672.2939785

  7. [7]

    Dagdoug, M., Dombry, C., and Duchamps, J.-J. (2025). An RKHS perspective on tree ensembles. arXiv preprint . doi:10.48550/arXiv.2512.00397

  8. [8]

    and Duchamps, J.-J

    Dombry, C. and Duchamps, J.-J. (2024a). Infinitesimal gradient boosting. Stochastic Processes and their Applications , 170. doi:10.1016/j.spa.2024.104310

  9. [9]

    and Duchamps, J.-J

    Dombry, C. and Duchamps, J.-J. (2024b). A large-sample theory for infinitesimal gradient boosting. Bernoulli , 30(3):1894--1920. doi:10.3150/23-BEJ1657

  10. [10]

    and Esstafa, Y

    Dombry, C. and Esstafa, Y. (2024). The vanishing learning rate asymptotic for linear l^2 -boosting. ESAIM: Probability and Statistics , 28:227--257. doi:10.1051/ps/2024006

  11. [11]

    Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Statist. , 29(5):1189--1232. doi:10.1214/aos/1013203451

  12. [12]

    and Eubank, R

    Hsing, T. and Eubank, R. L. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators . Wiley Series in Probability and Statistics. John Wiley and Sons, Inc , Chichester, West Sussex, UK

  13. [13]

    Jiang, W. (2004). Process consistency for A da B oost. Ann. Statist. , 32(1):13--29. doi:10.1214/aos/1079120128

  14. [14]

    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). LightGBM : a highly efficient gradient boosting decision tree. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30 , pages 3146--3154. Curran Associates, Inc

  15. [15]

    Probability in Banach Spaces , subtitle=

    Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces . Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-20212-4

  16. [16]

    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DaSilva, Eli Elhage, et al

    Lugosi, G. and Vayatis, N. (2004). On the B ayes-risk consistency of regularized boosting methods. Ann. Statist. , 32(1):30--55. doi:10.1214/aos/1079120129

  17. [17]

    Marcus, D. J. (1985). Relationships between D onsker classes and S obolev spaces. Zeitschrift f \"u r Wahrscheinlichkeitstheorie und Verwandte Gebiete , 69:323--330. doi:10.1007/BF00532737

  18. [18]

    van der Vaart, A. W. (1998). Asymptotic statistics , volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge. doi:10.1017/CBO9780511802256

  19. [19]

    van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes . Springer Series in Statistics. Springer-Verlag, New York. doi:10.1007/978-1-4757-2545-2. With applications to statistics

  20. [20]

    and Yu, B

    Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. Ann. Statist. , 33(4):1538--1579. doi:10.1214/009053605000000255