Stochastic Particle Gradient Descent for Infinite Ensembles

Atsushi Nitanda, Taiji Suzuki

Authors on Pith no claims yet

classification 📊 stat.ML cs.LGmath.OC

keywords methodmethodsoptimizationconstraintensembleinfiniteproblemsregularization

read the original abstract

The superior performance of ensemble methods with infinite models are well known. Most of these methods are based on optimization problems in infinite-dimensional spaces with some regularization, for instance, boosting methods and convex neural networks use $L^1$-regularization with the non-negative constraint. However, due to the difficulty of handling $L^1$-regularization, these problems require early stopping or a rough approximation to solve it inexactly. In this paper, we propose a new ensemble learning method that performs in a space of probability measures, that is, our method can handle the $L^1$-constraint and the non-negative constraint in a rigorous way. Such an optimization is realized by proposing a general purpose stochastic optimization method for learning probability measures via parameterization using transport maps on base models. As a result of running the method, a transport map to output an infinite ensemble is obtained, which forms a residual-type network. From the perspective of functional gradient methods, we give a convergence rate as fast as that of a stochastic optimization method for finite dimensional nonconvex problems. Moreover, we show an interior optimality property of a local optimality condition used in our analysis.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
math.OC 2026-05 unverdicted novelty 6.0

Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.