A functional central limit theorem for kernel gradient flow and infinitesimal gradient boosting

Cl\'ement Dombry (LMB); Jean-Jil Duchamps (LMB)

arxiv: 2606.25494 · v1 · pith:662QXLMZnew · submitted 2026-06-24 · 🧮 math.PR · math.ST· stat.ML· stat.TH

A functional central limit theorem for kernel gradient flow and infinitesimal gradient boosting

Cl\'ement Dombry (LMB) , Jean-Jil Duchamps (LMB) This is my paper

Pith reviewed 2026-06-25 20:21 UTC · model grok-4.3

classification 🧮 math.PR math.STstat.MLstat.TH

keywords functional central limit theoreminfinitesimal gradient boostingkernel gradient flowreproducing kernel Hilbert spaceGaussian processordinary differential equationstochastic perturbation

0 comments

The pith

The rescaled deviations of infinitesimal gradient boosting from its deterministic limit converge in distribution to a Gaussian process in an associated reproducing kernel Hilbert space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves a functional central limit theorem for the fluctuations of kernel gradient flow and infinitesimal gradient boosting around their mean behavior. It models the boosting process as the solution to an autonomous ODE in a reproducing kernel Hilbert space tied to the softmax gradient tree base learner. A general result on stochastic perturbations of ODEs in Banach spaces is used to transfer convergence and central limit theorem properties from the driving vector fields to the ODE solutions. This matters for understanding variability in boosting beyond average performance, first in the simpler kernel flow case with explicit Gaussian characterization and then in the tree-based setting.

Core claim

We establish a functional central limit theorem: the rescaled deviations converge in distribution to a Gaussian process. The analysis is carried out in a reproducing kernel Hilbert space naturally associated with the softmax gradient tree base learner, in which the boosting process is characterized as the solution of an autonomous ordinary differential equation. The proof rests on a general stochastic perturbation analysis of ODEs in Banach spaces, which is of independent interest: whenever a sequence of vector fields converges and satisfies a central limit theorem, so does the associated ODE solution. We first illustrate this perturbation approach in the simpler setting of kernel gradient f

What carries the argument

The autonomous ordinary differential equation in the reproducing kernel Hilbert space associated with the softmax gradient tree base learner, which characterizes the boosting process and transfers central limit behavior via stochastic perturbation analysis.

If this is right

In the kernel gradient flow case the limiting Gaussian process admits an explicit characterization.
The general perturbation result for ODEs applies to any sequence of vector fields satisfying the stated convergence and CLT conditions.
The tree-based boosting fluctuations are captured in the RKHS naturally induced by the softmax gradient tree learner.
The functional CLT extends the prior large-sample deterministic analysis of infinitesimal gradient boosting to its stochastic fluctuations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit Gaussian limit in the kernel flow case could be used to derive asymptotic confidence sets for the flow trajectory.
The Banach-space ODE perturbation lemma may transfer to other continuous-time machine learning dynamics such as gradient flows on neural networks.
Finite-sample diagnostics that check convergence of empirical covariances to the predicted Gaussian covariance operator would provide a practical test of the theorem.

Load-bearing premise

Whenever a sequence of vector fields converges and satisfies a central limit theorem, the associated ODE solution in the Banach space does as well.

What would settle it

Numerical simulations in which the rescaled boosting deviations fail to converge in distribution to the predicted Gaussian process when measured in the RKHS norm associated with the base learner.

read the original abstract

Building on the large-sample analysis of infinitesimal gradient boosting (Dombry and Duchamps, 2024b), we study the fluctuations of the process around its deterministic limit and establish a functional central limit theorem: the rescaled deviations converge in distribution to a Gaussian process. The analysis is carried out in a reproducing kernel Hilbert space (RKHS) naturally associated with the softmax gradient tree base learner, in which the boosting process is characterized as the solution of an autonomous ordinary differential equation (ODE). The proof rests on a general stochastic perturbation analysis of ODEs in Banach spaces, which is of independent interest: whenever a sequence of vector fields converges and satisfies a central limit theorem, so does the associated ODE solution. We first illustrate this perturbation approach in the simpler setting of kernel gradient flow, where the Gaussian limit admits an explicit characterization, and then consider the more complicated tree-based gradient boosting setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends the authors' 2024b mean-field analysis with a functional CLT for boosting fluctuations, resting on a general Banach-space ODE perturbation theorem that needs verification for the tree vector field.

read the letter

The paper's main contribution is a functional central limit theorem: rescaled deviations of the infinitesimal gradient boosting process converge in distribution to a Gaussian process in the RKHS tied to the softmax gradient tree base learner. They first work out the kernel gradient flow case with an explicit limit, then apply the same strategy to the tree setting by modeling the process as an autonomous ODE and invoking a general stochastic perturbation result for ODEs in Banach spaces.

The general perturbation theorem is presented as independent interest and looks like the cleanest part. Framing the boosting dynamics as an ODE solution and transferring a CLT from the driving vector fields to the flow itself is a reasonable way to get fluctuations beyond the deterministic limit. The kernel case benefits from explicit calculations, which makes the argument easier to follow.

The softer part is the tree application. The vector field comes from the softmax operator on trees, and the perturbation theorem requires conditions like local Lipschitz continuity or linear growth in the RKHS norm, plus tightness of the noise. The abstract does not display those checks, so the tree claim depends on steps that are not visible here and could be the least secure. If the full paper supplies them cleanly, the result holds; otherwise the tree half is thinner than the kernel half.

This is for readers already working on probabilistic limits for gradient boosting or kernel methods. Anyone who read the 2024b paper will see this as the natural next step. The approach is coherent and the citation pattern is appropriate. It deserves a serious referee to verify the Banach-space conditions and the tree-specific verifications.

Referee Report

1 major / 0 minor

Summary. The paper establishes a functional central limit theorem showing that rescaled deviations of the infinitesimal gradient boosting process from its deterministic limit converge in distribution to a Gaussian process. The analysis is performed in the RKHS associated with the softmax gradient tree base learner, where the boosting dynamics are characterized as the solution to an autonomous ODE in a Banach space. The proof relies on a general stochastic perturbation result for ODEs (of independent interest) that transfers a CLT on the driving vector fields to the ODE flows; this is first illustrated explicitly for kernel gradient flow and then applied to the tree-based setting.

Significance. If the required verifications hold, the result supplies the first rigorous fluctuation theory for gradient boosting around its mean-field limit, which is significant for understanding statistical variability in tree ensembles. The general perturbation theorem for ODEs in Banach spaces is a strength of independent interest. The explicit characterization of the Gaussian limit in the kernel gradient flow case is also a clear positive.

major comments (1)

[tree-based gradient boosting setting] Application to the tree-based gradient boosting setting: the general perturbation theorem requires that the vector field induced by the softmax operator on trees satisfies the theorem's hypotheses (e.g., uniform Lipschitz continuity or local Lipschitz plus linear growth in the RKHS norm, together with tightness of the driving noise). These conditions are load-bearing for the tree claim but their verification is not visible from the abstract and must be checked explicitly in the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the significance of our results and for the constructive comment. We respond to the major comment below.

read point-by-point responses

Referee: [tree-based gradient boosting setting] Application to the tree-based gradient boosting setting: the general perturbation theorem requires that the vector field induced by the softmax operator on trees satisfies the theorem's hypotheses (e.g., uniform Lipschitz continuity or local Lipschitz plus linear growth in the RKHS norm, together with tightness of the driving noise). These conditions are load-bearing for the tree claim but their verification is not visible from the abstract and must be checked explicitly in the manuscript.

Authors: We agree that the hypotheses of the general perturbation theorem must be verified explicitly for the tree-based application, as these are indeed load-bearing. The manuscript carries out these verifications after introducing the RKHS associated with the softmax gradient tree: local Lipschitz continuity with linear growth in the RKHS norm is established in Section 3.2 (via the properties of the softmax operator and the finite-dimensional nature of the tree base learners), while tightness of the driving noise follows from the moment bounds and the functional CLT for the tree learners, as shown in Proposition 4.3 and Appendix C. To address the concern that these steps may not be immediately visible, we will revise the manuscript by inserting a short dedicated paragraph (new Subsection 3.4) immediately after the statement of the main tree-based theorem. This paragraph will list each hypothesis of the general theorem and cite the precise location where it is verified, thereby making the argument fully self-contained without altering the proofs. revision: yes

Circularity Check

0 steps flagged

Self-citation to prior deterministic limit; general perturbation theorem proved independently

full rationale

The derivation applies a general stochastic perturbation theorem for ODEs in Banach spaces (presented as new and of independent interest) to the boosting process characterized as an autonomous ODE in the associated RKHS. This builds on the authors' 2024b paper only for the deterministic large-sample limit; the functional CLT for rescaled deviations is obtained by transferring the CLT on vector fields to the ODE flows without reducing the target Gaussian process to any fitted parameter or self-referential definition within the paper's own equations. No self-definitional, fitted-input, or uniqueness-imported circularity is present.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard results from the theory of ODEs in Banach spaces and functional central limit theorems; no free parameters are fitted and no new entities are postulated.

axioms (2)

standard math Standard results on existence, uniqueness, and continuous dependence for ODEs in Banach spaces
Invoked to justify that the boosting process is the solution of an autonomous ODE and to apply the perturbation analysis.
domain assumption Central limit theorem for the driving vector fields in the RKHS
Required for the general perturbation result to transfer the CLT from the vector fields to the ODE solutions.

pith-pipeline@v0.9.1-grok · 5692 in / 1432 out tokens · 41059 ms · 2026-06-25T20:21:30.115856+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 18 canonical work pages

[1]

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society , 68(3):337--404. doi:10.1090/S0002-9947-1950-0051437-7

work page doi:10.1090/s0002-9947-1950-0051437-7 1950
[2]

Blanchard, G., Lugosi, G., and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. , 4(5):861--894. doi:10.1162/1532443041424319

work page doi:10.1162/1532443041424319 2004
[3]

Breiman, L. (2004). Population theory for boosting ensembles. Ann. Statist. , 32(1):1--11. doi:10.1214/aos/1079120126

work page doi:10.1214/aos/1079120126 2004
[4]

and Olshen, Richard A

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees . Chapman & Hall . doi:10.1201/9781315139470

work page doi:10.1201/9781315139470 1984
[5]

and Yu, B

B\" u hlmann, P. and Yu, B. (2003). Boosting with the L_2 loss: regression and classification. J. Amer. Statist. Assoc. , 98(462):324--339. doi:10.1198/016214503000125

work page doi:10.1198/016214503000125 2003
[6]

XGBoost: A scalable tree boosting system

Chen, T. and Guestrin, C. (2016). XGBoost : A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 785--794, San Francisco California USA . ACM . doi:10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[7]

Dagdoug, M., Dombry, C., and Duchamps, J.-J. (2025). An RKHS perspective on tree ensembles. arXiv preprint . doi:10.48550/arXiv.2512.00397

work page doi:10.48550/arxiv.2512.00397 2025
[8]

and Duchamps, J.-J

Dombry, C. and Duchamps, J.-J. (2024a). Infinitesimal gradient boosting. Stochastic Processes and their Applications , 170. doi:10.1016/j.spa.2024.104310

work page doi:10.1016/j.spa.2024.104310 2024
[9]

and Duchamps, J.-J

Dombry, C. and Duchamps, J.-J. (2024b). A large-sample theory for infinitesimal gradient boosting. Bernoulli , 30(3):1894--1920. doi:10.3150/23-BEJ1657

work page doi:10.3150/23-bej1657 1920
[10]

and Esstafa, Y

Dombry, C. and Esstafa, Y. (2024). The vanishing learning rate asymptotic for linear l^2 -boosting. ESAIM: Probability and Statistics , 28:227--257. doi:10.1051/ps/2024006

work page doi:10.1051/ps/2024006 2024
[11]

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Statist. , 29(5):1189--1232. doi:10.1214/aos/1013203451

work page doi:10.1214/aos/1013203451 2001
[12]

and Eubank, R

Hsing, T. and Eubank, R. L. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators . Wiley Series in Probability and Statistics. John Wiley and Sons, Inc , Chichester, West Sussex, UK

2015
[13]

Jiang, W. (2004). Process consistency for A da B oost. Ann. Statist. , 32(1):13--29. doi:10.1214/aos/1079120128

work page doi:10.1214/aos/1079120128 2004
[14]

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). LightGBM : a highly efficient gradient boosting decision tree. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30 , pages 3146--3154. Curran Associates, Inc

2017
[15]

Probability in Banach Spaces , subtitle=

Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces . Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-20212-4

work page doi:10.1007/978-3-642-20212-4 1991
[16]

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DaSilva, Eli Elhage, et al

Lugosi, G. and Vayatis, N. (2004). On the B ayes-risk consistency of regularized boosting methods. Ann. Statist. , 32(1):30--55. doi:10.1214/aos/1079120129

work page doi:10.1214/aos/1079120129 2004
[17]

Marcus, D. J. (1985). Relationships between D onsker classes and S obolev spaces. Zeitschrift f \"u r Wahrscheinlichkeitstheorie und Verwandte Gebiete , 69:323--330. doi:10.1007/BF00532737

work page doi:10.1007/bf00532737 1985
[18]

van der Vaart, A. W. (1998). Asymptotic statistics , volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge. doi:10.1017/CBO9780511802256

work page doi:10.1017/cbo9780511802256 1998
[19]

van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes . Springer Series in Statistics. Springer-Verlag, New York. doi:10.1007/978-1-4757-2545-2. With applications to statistics

work page doi:10.1007/978-1-4757-2545-2 1996
[20]

and Yu, B

Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. Ann. Statist. , 33(4):1538--1579. doi:10.1214/009053605000000255

work page doi:10.1214/009053605000000255 2005

[1] [1]

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society , 68(3):337--404. doi:10.1090/S0002-9947-1950-0051437-7

work page doi:10.1090/s0002-9947-1950-0051437-7 1950

[2] [2]

Blanchard, G., Lugosi, G., and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. , 4(5):861--894. doi:10.1162/1532443041424319

work page doi:10.1162/1532443041424319 2004

[3] [3]

Breiman, L. (2004). Population theory for boosting ensembles. Ann. Statist. , 32(1):1--11. doi:10.1214/aos/1079120126

work page doi:10.1214/aos/1079120126 2004

[4] [4]

and Olshen, Richard A

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees . Chapman & Hall . doi:10.1201/9781315139470

work page doi:10.1201/9781315139470 1984

[5] [5]

and Yu, B

B\" u hlmann, P. and Yu, B. (2003). Boosting with the L_2 loss: regression and classification. J. Amer. Statist. Assoc. , 98(462):324--339. doi:10.1198/016214503000125

work page doi:10.1198/016214503000125 2003

[6] [6]

XGBoost: A scalable tree boosting system

Chen, T. and Guestrin, C. (2016). XGBoost : A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 785--794, San Francisco California USA . ACM . doi:10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[7] [7]

Dagdoug, M., Dombry, C., and Duchamps, J.-J. (2025). An RKHS perspective on tree ensembles. arXiv preprint . doi:10.48550/arXiv.2512.00397

work page doi:10.48550/arxiv.2512.00397 2025

[8] [8]

and Duchamps, J.-J

Dombry, C. and Duchamps, J.-J. (2024a). Infinitesimal gradient boosting. Stochastic Processes and their Applications , 170. doi:10.1016/j.spa.2024.104310

work page doi:10.1016/j.spa.2024.104310 2024

[9] [9]

and Duchamps, J.-J

Dombry, C. and Duchamps, J.-J. (2024b). A large-sample theory for infinitesimal gradient boosting. Bernoulli , 30(3):1894--1920. doi:10.3150/23-BEJ1657

work page doi:10.3150/23-bej1657 1920

[10] [10]

and Esstafa, Y

Dombry, C. and Esstafa, Y. (2024). The vanishing learning rate asymptotic for linear l^2 -boosting. ESAIM: Probability and Statistics , 28:227--257. doi:10.1051/ps/2024006

work page doi:10.1051/ps/2024006 2024

[11] [11]

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Statist. , 29(5):1189--1232. doi:10.1214/aos/1013203451

work page doi:10.1214/aos/1013203451 2001

[12] [12]

and Eubank, R

Hsing, T. and Eubank, R. L. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators . Wiley Series in Probability and Statistics. John Wiley and Sons, Inc , Chichester, West Sussex, UK

2015

[13] [13]

Jiang, W. (2004). Process consistency for A da B oost. Ann. Statist. , 32(1):13--29. doi:10.1214/aos/1079120128

work page doi:10.1214/aos/1079120128 2004

[14] [14]

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). LightGBM : a highly efficient gradient boosting decision tree. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30 , pages 3146--3154. Curran Associates, Inc

2017

[15] [15]

Probability in Banach Spaces , subtitle=

Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces . Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-20212-4

work page doi:10.1007/978-3-642-20212-4 1991

[16] [16]

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DaSilva, Eli Elhage, et al

Lugosi, G. and Vayatis, N. (2004). On the B ayes-risk consistency of regularized boosting methods. Ann. Statist. , 32(1):30--55. doi:10.1214/aos/1079120129

work page doi:10.1214/aos/1079120129 2004

[17] [17]

Marcus, D. J. (1985). Relationships between D onsker classes and S obolev spaces. Zeitschrift f \"u r Wahrscheinlichkeitstheorie und Verwandte Gebiete , 69:323--330. doi:10.1007/BF00532737

work page doi:10.1007/bf00532737 1985

[18] [18]

van der Vaart, A. W. (1998). Asymptotic statistics , volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge. doi:10.1017/CBO9780511802256

work page doi:10.1017/cbo9780511802256 1998

[19] [19]

van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes . Springer Series in Statistics. Springer-Verlag, New York. doi:10.1007/978-1-4757-2545-2. With applications to statistics

work page doi:10.1007/978-1-4757-2545-2 1996

[20] [20]

and Yu, B

Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. Ann. Statist. , 33(4):1538--1579. doi:10.1214/009053605000000255

work page doi:10.1214/009053605000000255 2005