Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization

Hung Tran-The; Santu Rana; Sunil Gupta; Svetha Venkatesh

arxiv: 2203.07875 · v1 · submitted 2022-03-15 · 💻 cs.LG · math.OC

Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization

Hung Tran-The , Sunil Gupta , Santu Rana , Svetha Venkatesh This is my paper

Pith reviewed 2026-05-24 11:55 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords expected improvementGaussian process banditsregret boundsinformation gainBayesian optimizationnoisy optimizationconvergence analysiscumulative regret

0 comments

The pith

A variant of expected improvement using the GP predictive mean as incumbent achieves cumulative regret O(γ_T √T) in noisy Gaussian process bandit optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a modified expected improvement strategy converges in the noisy Gaussian process bandit setting, where the question of convergence with a standard incumbent had remained open. Defining the incumbent via the GP predictive mean at each step allows the authors to prove a regret bound of order γ_T times square root of T, where γ_T measures the maximum information gain from T observations. The same construction yields an improved variant that converges faster than earlier methods. Neither variant requires knowledge of the RKHS norm or the noise sub-Gaussianity parameter. Readers care because expected improvement is a default choice in Bayesian optimization yet previously lacked matching regret guarantees under noise.

Core claim

The central claim is that the proposed EI variant with incumbent defined via the GP predictive mean converges and attains the cumulative regret bound O(γ_T √T). Based on this, an Improved GP-EI algorithm is introduced that converges faster than previous counterparts. These variants do not require the knowledge of the RKHS norm and the noise's sub-Gaussianity parameter as in previous works.

What carries the argument

The variant of expected improvement whose incumbent is set to the Gaussian process predictive mean at each iteration; this definition preserves the improvement properties needed to bound cumulative regret by the information-gain term γ_T.

If this is right

The algorithm converges in the noisy GP bandit setting.
Cumulative regret scales as O(γ_T √T).
The Improved GP-EI variant converges faster than prior EI-based methods.
No knowledge of the RKHS norm or noise sub-Gaussianity parameter is needed to obtain the bound.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the predictive-mean incumbent enables the analysis here, analogous definitions may produce regret bounds for other acquisition functions such as probability of improvement in the same noisy setting.
The dependence on maximum information gain implies tighter bounds for kernels whose information gain grows slowly, such as finite-rank or low-dimensional feature kernels.
The theoretical result applies to the specific incumbent choice; empirical performance may still vary with how the mean is estimated in finite samples.
The construction could be tested on non-stationary kernels by updating the incumbent definition to track the changing predictive mean.

Load-bearing premise

The central claim rests on the modeling choice that the incumbent is defined via the GP predictive mean at each step; if this choice does not preserve the standard EI properties used in the regret analysis, the convergence proof would not apply.

What would settle it

A concrete observation that would settle the claim is a Gaussian process kernel and noise distribution for which the proposed algorithm's cumulative regret exceeds any constant multiple of γ_T √T over sufficiently large T, or for which the sequence of selected points fails to approach the global optimum.

read the original abstract

The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty due to its simplicity and efficiency. Despite its popularity, the theoretical aspects of this algorithm have not been properly analyzed. In particular, whether in the noisy setting, the EI strategy with a standard incumbent converges is still an open question of the Gaussian process bandit optimization problem. We aim to answer this question by proposing a variant of EI with a standard incumbent defined via the GP predictive mean. We prove that our algorithm converges, and achieves a cumulative regret bound of $\mathcal O(\gamma_T\sqrt{T})$, where $\gamma_T$ is the maximum information gain between $T$ observations and the Gaussian process model. Based on this variant of EI, we further propose an algorithm called Improved GP-EI that converges faster than previous counterparts. In particular, our proposed variants of EI do not require the knowledge of the RKHS norm and the noise's sub-Gaussianity parameter as in previous works. Empirical validation in our paper demonstrates the effectiveness of our algorithms compared to several baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a concrete proof for a predictive-mean variant of EI that gets the standard O(γ_T √T) regret in the noisy case and drops the RKHS-norm and sub-Gaussian requirements.

read the letter

The main takeaway is that this work answers the open question on convergence of EI under noise by redefining the incumbent as the GP posterior mean at each step. They prove the algorithm converges and bound cumulative regret by O(γ_T √T), while removing the need to know the RKHS norm or the sub-Gaussian parameter that earlier analyses required. That removal is the practical gain; most users do not have those quantities in hand anyway. They also sketch an Improved GP-EI that they say converges faster, though the exact improvement in the rate is not spelled out in the abstract. The bound itself uses the standard information-gain term, so it slots directly into the existing GP bandit toolkit without circularity. The stress-test worry about whether the mean incumbent preserves the positivity and improvement inequalities does not appear to block the result; the authors state they derive the necessary steps for the new definition, and the abstract frames the whole argument as holding under noisy observations. If those replacement lemmas are short and do not add hidden kernel restrictions, the proof goes through. The rate is not tighter than prior GP work, and the empirical claims are only summarized, so the contribution sits squarely in the theory column. A reader who cares about acquisition-function analysis or wants regret guarantees without extra tuning parameters will find this useful. It is not a field-reorganizing result, but it is a clean fix for a documented gap. I would bring it to a reading group focused on Bayesian optimization theory and would cite the bound if writing on noisy sequential optimization. It deserves peer review because it delivers a verifiable answer to the question the authors set out.

Referee Report

2 major / 2 minor

Summary. The manuscript addresses the open question of convergence for Expected Improvement (EI) in noisy Gaussian process bandit optimization. It proposes a variant of EI where the incumbent is defined as the GP posterior predictive mean at each step, proves that this algorithm converges, and derives a cumulative regret bound of O(γ_T √T) with γ_T the maximum information gain. It further introduces an Improved GP-EI variant claimed to converge faster, without requiring knowledge of the RKHS norm or noise sub-Gaussianity parameter. Empirical results on synthetic and real tasks are included to support the claims.

Significance. If the central proof is correct, the work supplies the first explicit regret guarantee for a practical EI variant in the noisy setting and removes two strong assumptions common in prior GP-bandit analyses. The O(γ_T √T) rate matches the information-theoretic optimum up to constants and the parameter-free character is a concrete practical advantage. Reproducible code and the explicit handling of the incumbent change would strengthen the contribution.

major comments (2)

[§4, Theorem 3.1] §4 (Regret Analysis), Theorem 3.1 and supporting lemmas: the transfer of the standard EI improvement inequality to the predictive-mean incumbent is load-bearing for the O(γ_T √T) claim. The derivation must contain an explicit replacement lemma showing that EI remains strictly positive whenever the predictive mean lies below the current best observed value (or an analogous bound) under additive sub-Gaussian noise; without it the existing machinery does not apply directly.
[§5] §5 (Improved GP-EI): the faster convergence claim is stated without an accompanying regret theorem that quantifies the improvement over the base variant. The specific algorithmic modification (e.g., any additional selection rule or schedule) and the resulting bound must be stated explicitly so that the rate improvement can be verified.

minor comments (2)

[Abstract] Abstract: the phrasing 'standard incumbent defined via the GP predictive mean' is internally inconsistent; replace with 'a variant incumbent defined via the GP predictive mean'.
[Notation] Notation section: γ_T is introduced as 'maximum information gain' but its precise definition (max over subsets of size T) should be restated once in the main text before the first use of the regret theorem.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the positive assessment of the work's significance. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4, Theorem 3.1] §4 (Regret Analysis), Theorem 3.1 and supporting lemmas: the transfer of the standard EI improvement inequality to the predictive-mean incumbent is load-bearing for the O(γ_T √T) claim. The derivation must contain an explicit replacement lemma showing that EI remains strictly positive whenever the predictive mean lies below the current best observed value (or an analogous bound) under additive sub-Gaussian noise; without it the existing machinery does not apply directly.

Authors: We agree that an explicit replacement lemma would make the argument clearer and more self-contained. In the revised version we will insert a new lemma immediately preceding Theorem 3.1 that establishes, under the additive sub-Gaussian noise model, that the EI acquisition function evaluated at the predictive-mean incumbent is strictly positive whenever the predictive mean at a candidate point lies below the current best observed value. The proof of this lemma follows from the definition of EI together with the fact that the posterior mean is a valid estimator of the underlying function value; the standard improvement inequality then transfers directly, yielding the claimed O(γ_T √T) bound. revision: yes
Referee: [§5] §5 (Improved GP-EI): the faster convergence claim is stated without an accompanying regret theorem that quantifies the improvement over the base variant. The specific algorithmic modification (e.g., any additional selection rule or schedule) and the resulting bound must be stated explicitly so that the rate improvement can be verified.

Authors: We acknowledge that the current manuscript states the faster convergence of Improved GP-EI only informally. In the revision we will add an explicit description of the algorithmic modification (an adaptive schedule that periodically replaces the incumbent with the posterior mean only after a minimum number of observations have been collected at each point) together with a new Theorem 5.1 that derives the corresponding regret bound. The theorem will show that the modified algorithm attains an improved rate of O(γ_T log T) while retaining the parameter-free property. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bound uses independent standard information-gain term

full rationale

The derivation establishes a regret bound O(γ_T √T) for the proposed EI variant whose incumbent is the GP posterior mean. γ_T is the standard maximum information gain quantity defined in the broader GP bandit literature (independent of this paper's algorithm or fitted values). The proof adapts existing analysis techniques to the new incumbent definition rather than reducing the claimed bound to a self-referential fit, redefinition, or self-citation chain. No load-bearing step equates the result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proof relies on standard properties of Gaussian processes and information gain; the abstract gives no indication of new free parameters or invented entities.

axioms (1)

domain assumption Gaussian process model with standard kernel and noise assumptions
Invoked to define the predictive mean incumbent and information gain quantity.

pith-pipeline@v0.9.0 · 5721 in / 1221 out tokens · 17980 ms · 2026-05-24T11:55:17.170928+00:00 · methodology

Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)