Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
Pith reviewed 2026-05-24 11:55 UTC · model grok-4.3
The pith
A variant of expected improvement using the GP predictive mean as incumbent achieves cumulative regret O(γ_T √T) in noisy Gaussian process bandit optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the proposed EI variant with incumbent defined via the GP predictive mean converges and attains the cumulative regret bound O(γ_T √T). Based on this, an Improved GP-EI algorithm is introduced that converges faster than previous counterparts. These variants do not require the knowledge of the RKHS norm and the noise's sub-Gaussianity parameter as in previous works.
What carries the argument
The variant of expected improvement whose incumbent is set to the Gaussian process predictive mean at each iteration; this definition preserves the improvement properties needed to bound cumulative regret by the information-gain term γ_T.
If this is right
- The algorithm converges in the noisy GP bandit setting.
- Cumulative regret scales as O(γ_T √T).
- The Improved GP-EI variant converges faster than prior EI-based methods.
- No knowledge of the RKHS norm or noise sub-Gaussianity parameter is needed to obtain the bound.
Where Pith is reading between the lines
- If the predictive-mean incumbent enables the analysis here, analogous definitions may produce regret bounds for other acquisition functions such as probability of improvement in the same noisy setting.
- The dependence on maximum information gain implies tighter bounds for kernels whose information gain grows slowly, such as finite-rank or low-dimensional feature kernels.
- The theoretical result applies to the specific incumbent choice; empirical performance may still vary with how the mean is estimated in finite samples.
- The construction could be tested on non-stationary kernels by updating the incumbent definition to track the changing predictive mean.
Load-bearing premise
The central claim rests on the modeling choice that the incumbent is defined via the GP predictive mean at each step; if this choice does not preserve the standard EI properties used in the regret analysis, the convergence proof would not apply.
What would settle it
A concrete observation that would settle the claim is a Gaussian process kernel and noise distribution for which the proposed algorithm's cumulative regret exceeds any constant multiple of γ_T √T over sufficiently large T, or for which the sequence of selected points fails to approach the global optimum.
read the original abstract
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty due to its simplicity and efficiency. Despite its popularity, the theoretical aspects of this algorithm have not been properly analyzed. In particular, whether in the noisy setting, the EI strategy with a standard incumbent converges is still an open question of the Gaussian process bandit optimization problem. We aim to answer this question by proposing a variant of EI with a standard incumbent defined via the GP predictive mean. We prove that our algorithm converges, and achieves a cumulative regret bound of $\mathcal O(\gamma_T\sqrt{T})$, where $\gamma_T$ is the maximum information gain between $T$ observations and the Gaussian process model. Based on this variant of EI, we further propose an algorithm called Improved GP-EI that converges faster than previous counterparts. In particular, our proposed variants of EI do not require the knowledge of the RKHS norm and the noise's sub-Gaussianity parameter as in previous works. Empirical validation in our paper demonstrates the effectiveness of our algorithms compared to several baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript addresses the open question of convergence for Expected Improvement (EI) in noisy Gaussian process bandit optimization. It proposes a variant of EI where the incumbent is defined as the GP posterior predictive mean at each step, proves that this algorithm converges, and derives a cumulative regret bound of O(γ_T √T) with γ_T the maximum information gain. It further introduces an Improved GP-EI variant claimed to converge faster, without requiring knowledge of the RKHS norm or noise sub-Gaussianity parameter. Empirical results on synthetic and real tasks are included to support the claims.
Significance. If the central proof is correct, the work supplies the first explicit regret guarantee for a practical EI variant in the noisy setting and removes two strong assumptions common in prior GP-bandit analyses. The O(γ_T √T) rate matches the information-theoretic optimum up to constants and the parameter-free character is a concrete practical advantage. Reproducible code and the explicit handling of the incumbent change would strengthen the contribution.
major comments (2)
- [§4, Theorem 3.1] §4 (Regret Analysis), Theorem 3.1 and supporting lemmas: the transfer of the standard EI improvement inequality to the predictive-mean incumbent is load-bearing for the O(γ_T √T) claim. The derivation must contain an explicit replacement lemma showing that EI remains strictly positive whenever the predictive mean lies below the current best observed value (or an analogous bound) under additive sub-Gaussian noise; without it the existing machinery does not apply directly.
- [§5] §5 (Improved GP-EI): the faster convergence claim is stated without an accompanying regret theorem that quantifies the improvement over the base variant. The specific algorithmic modification (e.g., any additional selection rule or schedule) and the resulting bound must be stated explicitly so that the rate improvement can be verified.
minor comments (2)
- [Abstract] Abstract: the phrasing 'standard incumbent defined via the GP predictive mean' is internally inconsistent; replace with 'a variant incumbent defined via the GP predictive mean'.
- [Notation] Notation section: γ_T is introduced as 'maximum information gain' but its precise definition (max over subsets of size T) should be restated once in the main text before the first use of the regret theorem.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the positive assessment of the work's significance. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4, Theorem 3.1] §4 (Regret Analysis), Theorem 3.1 and supporting lemmas: the transfer of the standard EI improvement inequality to the predictive-mean incumbent is load-bearing for the O(γ_T √T) claim. The derivation must contain an explicit replacement lemma showing that EI remains strictly positive whenever the predictive mean lies below the current best observed value (or an analogous bound) under additive sub-Gaussian noise; without it the existing machinery does not apply directly.
Authors: We agree that an explicit replacement lemma would make the argument clearer and more self-contained. In the revised version we will insert a new lemma immediately preceding Theorem 3.1 that establishes, under the additive sub-Gaussian noise model, that the EI acquisition function evaluated at the predictive-mean incumbent is strictly positive whenever the predictive mean at a candidate point lies below the current best observed value. The proof of this lemma follows from the definition of EI together with the fact that the posterior mean is a valid estimator of the underlying function value; the standard improvement inequality then transfers directly, yielding the claimed O(γ_T √T) bound. revision: yes
-
Referee: [§5] §5 (Improved GP-EI): the faster convergence claim is stated without an accompanying regret theorem that quantifies the improvement over the base variant. The specific algorithmic modification (e.g., any additional selection rule or schedule) and the resulting bound must be stated explicitly so that the rate improvement can be verified.
Authors: We acknowledge that the current manuscript states the faster convergence of Improved GP-EI only informally. In the revision we will add an explicit description of the algorithmic modification (an adaptive schedule that periodically replaces the incumbent with the posterior mean only after a minimum number of observations have been collected at each point) together with a new Theorem 5.1 that derives the corresponding regret bound. The theorem will show that the modified algorithm attains an improved rate of O(γ_T log T) while retaining the parameter-free property. revision: yes
Circularity Check
No circularity: regret bound uses independent standard information-gain term
full rationale
The derivation establishes a regret bound O(γ_T √T) for the proposed EI variant whose incumbent is the GP posterior mean. γ_T is the standard maximum information gain quantity defined in the broader GP bandit literature (independent of this paper's algorithm or fitted values). The proof adapts existing analysis techniques to the new incumbent definition rather than reducing the claimed bound to a self-referential fit, redefinition, or self-citation chain. No load-bearing step equates the result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gaussian process model with standard kernel and noise assumptions
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.