arxiv: 2605.10572 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: no theorem link

Online Sharp-Calibrated Bayesian Optimization

Marshal Arijona Sinaga , Julien Martinelli , Teemu Turpeinen , Samuel Kaski

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords Bayesian optimizationGaussian processesonline learningregret boundsuncertainty calibrationhyperparameter selection

0 comments

The pith

Bayesian optimization can adaptively balance the sharpness and calibration of its Gaussian process uncertainty estimates by casting hyperparameter selection as a constrained online learning problem while preserving sublinear regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian optimization relies on Gaussian process models whose hyperparameters are updated from sequential non-i.i.d. data. This can produce uncertainty estimates that are either too conservative or poorly calibrated. The paper shows how to treat the choice of hyperparameters as a constrained online learning task that trades off sharpness against calibration. If the underlying online learner has good regret properties, the overall Bayesian optimization procedure retains sublinear regret bounds. Experiments indicate that this approach competes with or outperforms existing methods on both synthetic and real tasks.

Core claim

The central discovery is that by formulating hyperparameter selection in Bayesian optimization as a constrained online learning problem, one can achieve an adaptive balance between the sharpness and calibration of the Gaussian process posterior while still inheriting sublinear regret guarantees from the online learning algorithm used to solve the constrained problem.

What carries the argument

The constrained online-learning formulation for selecting Gaussian process kernel hyperparameters, which is solved online along the Bayesian optimization trajectory.

If this is right

The resulting algorithm maintains both sharp and well-calibrated uncertainty along the optimization path.
Sublinear regret bounds hold under the guarantees of the chosen online learning solver.
Empirical performance ranks among the strongest methods in final simple regret with robust cumulative regret behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The constrained formulation could extend to other surrogate models that require online parameter adaptation.
Similar ideas might improve uncertainty handling in sequential decision tasks such as active learning with Gaussian processes.
The method suggests a template for adding calibration constraints to other online optimization procedures.

Load-bearing premise

The constrained online learning problem for hyperparameter selection admits an efficient online solver whose regret bounds transfer directly to the Bayesian optimization regret without further assumptions on the data or model.

What would settle it

Observing linear regret growth for OSCBO on a benchmark function where standard methods achieve sublinear regret would falsify the transfer of regret guarantees.

Figures

Figures reproduced from arXiv: 2605.10572 by Julien Martinelli, Marshal Arijona Sinaga, Samuel Kaski, Teemu Turpeinen.

**Figure 1.** Figure 1: Overview of OSCBO and main theoretical contributions. (A) OSCBO updates GP hyperparameters online through a sharpness–calibration primal–dual loop coupled to GP-UCB. (B) Posterior uncertainty and UCB query selection under different baselines, characterized by specific hyperparameter-adaptation rules or UCB bands. (C) Different methods explore different regions of the sharpness–calibration tradeoff, measure… view at source ↗

**Figure 2.** Figure 2: Synthetic and real-world benchmarks. Each plot displays simple regret across tasks and final simple-regret distribution for the top three methods, with mean ± standard error over 20 seeds. Lower-right: final simple-regret rank distribution across tasks (lower is better). OSCBO consistently attains competitive final simple regret, ranking among the strongest methods overall. Proof sketch. We start by applyi… view at source ↗

**Figure 3.** Figure 3: Sensitivity analysis and diagnostics. Top: Mean rank ± standard error across tasks. Regret is first averaged over 20 seeds for each task and baseline; baselines are then ranked within each task, and ranks are averaged across tasks. For the last panel, mean normalized regrets are shown. OSCBO demonstrates consistent performance across these different regimes, highlighting its robustness. Bottom: Levy 5D dia… view at source ↗

read the original abstract

Bayesian optimization (BO) is a widely used framework for optimizing expensive black-box functions, commonly based on Gaussian process (GP) surrogate models. Its effectiveness relies on uncertainty quantification that is both sharp (informative) and well-calibrated along the BO trajectory. In practice, GP kernel hyperparameters are unknown and are refit online from sequentially collected (non-i.i.d.) data, which can yield miscalibrated or overly conservative uncertainty and lies outside the fixed-kernel assumptions of standard BO regret theory. We propose Online Sharp-Calibrated Bayesian Optimization (OSCBO), a BO algorithm that adaptively balances GP sharpness and calibration by casting hyperparameter selection as a constrained online-learning problem. We also show that OSCBO preserves sublinear regret bounds by leveraging the theoretical guarantees of the underlying online learning algorithm. Empirically, OSCBO performs competitively across synthetic and real-world benchmarks, ranking among the strongest methods in final simple regret while maintaining robust cumulative-regret behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OSCBO turns online GP hyperparameter tuning into a constrained online-learning problem to balance sharpness and calibration, but the regret transfer to the non-i.i.d. BO trajectory still needs explicit conditions.

read the letter

The core move is to recast hyperparameter selection inside Bayesian optimization as a constrained online learning problem so that the GP uncertainty stays both sharp and calibrated as new points arrive. They then invoke the regret guarantees of that online learner to argue the overall BO regret remains sublinear. That reduction is not in the prior work they cite, and the empirical section shows the method stays competitive on synthetic and real benchmarks for final simple regret while keeping cumulative regret reasonable. Those are the two things worth knowing: a practical fix for a known mismatch between theory and practice, plus some evidence it works on the usual test suites. The formulation itself is clean and directly targets the issue of refitting kernels on sequentially collected data. The experiments look like standard BO benchmarks done carefully enough to be informative. The soft spot sits in the regret claim. The losses fed to the online learner are generated by the GP posterior, which is itself updated using points chosen by the acquisition function; that creates dependence on past decisions and a non-i.i.d. sequence. Standard online convex optimization bounds usually require i.i.d. losses or additional stability or Lipschitz arguments to compose without picking up an extra linear term. The abstract does not indicate that those extra conditions are supplied, so the transfer step remains the part that needs the most checking in the full proof. This is aimed at people who run GP-based BO on expensive functions and care about uncertainty quality rather than just point estimates. A reader working on practical BO improvements would get value from the formulation and the empirical comparison. The problem is real, the reduction is new, and the experiments are solid enough that the paper deserves a serious referee, though the theory section will need close attention on how the two regret bounds are combined.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Online Sharp-Calibrated Bayesian Optimization (OSCBO), a BO algorithm that formulates GP kernel hyperparameter selection as a constrained online-learning problem to adaptively balance sharpness and calibration of uncertainty estimates along the optimization trajectory. It claims that OSCBO preserves sublinear regret bounds by leveraging the theoretical guarantees of the underlying online learning algorithm, and reports competitive performance on synthetic and real-world benchmarks.

Significance. If the regret-transfer argument is made rigorous for the non-i.i.d. setting induced by adaptive querying, the work would address a practical gap in GP-based BO where hyperparameters are refit online, potentially allowing reliable uncertainty quantification without sacrificing the sublinear-regret guarantees that standard fixed-kernel analyses provide.

major comments (1)

[Abstract and theoretical analysis section] The central claim that OSCBO preserves sublinear regret by leveraging the online learner's guarantees (stated in the abstract) is load-bearing but requires explicit conditions for the composition. The loss sequence supplied to the online learner is generated from a GP posterior that is itself updated after each adaptively chosen (non-i.i.d.) observation; standard online convex optimization or online learning regret theorems typically demand i.i.d. losses, bounded variation, or adversarial-but-independent structure. Without additional Lipschitz or stability arguments that bound any extra linear term arising from the dependence on past decisions, the transfer may fail to remain sublinear. This issue is not resolved by the abstract's high-level statement.

minor comments (2)

[Abstract] The abstract states that OSCBO 'ranks among the strongest methods in final simple regret' but does not reference the specific tables or figures that support this ranking; adding such pointers would improve readability.
[Notation and method section] Notation for the constrained online-learning formulation (e.g., the precise definition of the calibration constraint and the loss function) should be introduced once and used consistently when the regret composition is discussed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the substantive comment on the regret-transfer argument. We address it directly below and will revise the manuscript to strengthen the theoretical justification.

read point-by-point responses

Referee: [Abstract and theoretical analysis section] The central claim that OSCBO preserves sublinear regret by leveraging the online learner's guarantees (stated in the abstract) is load-bearing but requires explicit conditions for the composition. The loss sequence supplied to the online learner is generated from a GP posterior that is itself updated after each adaptively chosen (non-i.i.d.) observation; standard online convex optimization or online learning regret theorems typically demand i.i.d. losses, bounded variation, or adversarial-but-independent structure. Without additional Lipschitz or stability arguments that bound any extra linear term arising from the dependence on past decisions, the transfer may fail to remain sublinear. This issue is not resolved by the abstract's high-level statement.

Authors: We agree that the abstract statement is high-level and that a fully rigorous transfer of the online learner's regret bound to the non-i.i.d. loss sequence induced by adaptive querying requires additional arguments. The current analysis applies the online convex optimization regret guarantee to the sequence of sharpness-calibration losses evaluated on the evolving GP posterior. To close the gap, we will revise the theoretical analysis section and add an appendix lemma establishing stability of the loss sequence. Under standard assumptions (compact domain, continuous kernel with bounded RKHS norm), the posterior mean and variance vary Lipschitz-continuously with respect to the data; this bounds the perturbation to each loss value by a term whose cumulative effect remains sublinear (O(T^{3/4}) when composed with typical online regret rates). The revised manuscript will explicitly state these conditions and include the decomposition showing that the extra linear term does not destroy sublinearity. These changes will be incorporated in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: regret preservation relies on external online-learning guarantees, not self-referential reduction

full rationale

The paper's central claim is that OSCBO preserves sublinear regret by leveraging the theoretical guarantees of an underlying online learning algorithm after casting hyperparameter selection as a constrained online problem. The abstract and reader's summary indicate this transfer uses external regret bounds rather than deriving them from the GP posterior or fitted parameters within the paper itself. No self-citations are load-bearing for the uniqueness or derivation of the regret result, no fitted inputs are renamed as predictions, and no ansatz or self-definition reduces the main result to its inputs by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard GP surrogate assumptions plus the transferability of online-learning regret bounds; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The underlying online learning algorithm provides sublinear regret guarantees that transfer to the BO setting under the proposed constrained formulation.
Invoked to claim preservation of sublinear regret bounds.

pith-pipeline@v0.9.0 · 5464 in / 1166 out tokens · 35823 ms · 2026-05-12T05:07:13.471785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

Online conformal prediction with decaying step sizes

Anastasios Nikolas Angelopoulos, Rina Barber, and Stephen Bates. Online conformal prediction with decaying step sizes. InProceedings of the 41st International Conference on Machine Learning, 2024

work page 2024
[2]

BoTorch: A framework for efficient monte-carlo bayesian optimization

Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. BoTorch: A framework for efficient monte-carlo bayesian optimization. InAdvances in Neural Information Processing Systems, 2020

work page 2020
[3]

Conformal prediction beyond exchangeability.The Annals of Statistics, 2023

Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. Conformal prediction beyond exchangeability.The Annals of Statistics, 2023

work page 2023
[4]

No-regret Bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research, 2019

Felix Berkenkamp, Angela P Schoellig, and Andreas Krause. No-regret Bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research, 2019

work page 2019
[5]

Beyond primal- dual methods in bandits with stochastic and adversarial constraints

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, and Federico Fusco. Beyond primal- dual methods in bandits with stochastic and adversarial constraints. InAdvances in Neural Information Processing Systems, 2024

work page 2024
[6]

Sharp calibrated Gaussian processes

Alexandre Capone, Sandra Hirche, and Geoff Pleiss. Sharp calibrated Gaussian processes. Advances in Neural Information Processing Systems, 2023

work page 2023
[7]

A unifying framework for online optimization with long-term constraints.Advances in Neural Information Processing Systems, 2022

Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Giulia Romano, and Nicola Gatti. A unifying framework for online optimization with long-term constraints.Advances in Neural Information Processing Systems, 2022

work page 2022
[8]

Targeted materials discovery using Bayesian algorithm execution.npj Computational Materials, 2024

Sathya R Chitturi, Akash Ramdas, Yue Wu, Brian Rohr, Stefano Ermon, Jennifer Dionne, Felipe H da Jornada, Mike Dunne, Christopher Tassone, Willie Neiswanger, et al. Targeted materials discovery using Bayesian algorithm execution.npj Computational Materials, 2024

work page 2024
[9]

On kernelized multi-armed bandits

Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. InInternational Conference on Machine Learning, 2017

work page 2017
[10]

Online calibrated and confor- mal prediction improves Bayesian optimization

Shachi Deshpande, Charles Marx, and V olodymyr Kuleshov. Online calibrated and confor- mal prediction improves Bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics, 2024

work page 2024
[11]

Calibrated regression against an adversary without regret

Shachi Deshpande, Charles Marx, and V olodymyr Kuleshov. Calibrated regression against an adversary without regret. InProceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, 2025

work page 2025
[12]

Scalable global optimization via local Bayesian optimization.Advances in neural information processing systems, 2019

David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. Scalable global optimization via local Bayesian optimization.Advances in neural information processing systems, 2019

work page 2019
[13]

Initializing Bayesian hyperparameter optimization via meta-learning

Matthias Feurer, Jost Springenberg, and Frank Hutter. Initializing Bayesian hyperparameter optimization via meta-learning. InProceedings of the AAAI conference on artificial intelligence, 2015

work page 2015
[14]

Cambridge University Press, 2023

Roman Garnett.Bayesian optimization. Cambridge University Press, 2023

work page 2023
[15]

Adaptive conformal inference under distribution shift

Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 2021

work page 2021
[16]

Probabilistic forecasts, calibration and sharpness.Journal of the Royal Statistical Society Series B: Statistical Methodology, 2007

Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E Raftery. Probabilistic forecasts, calibration and sharpness.Journal of the Royal Statistical Society Series B: Statistical Methodology, 2007

work page 2007
[17]

A Bayesian experimental autonomous researcher for mechanical design.Science advances, 2020

Aldair E Gongora, Bowen Xu, Wyatt Perry, Chika Okoye, Patrick Riley, Kristofer G Reyes, Elise F Morgan, and Keith A Brown. A Bayesian experimental autonomous researcher for mechanical design.Science advances, 2020

work page 2020
[18]

Portfolio allocation for Bayesian optimization

Matthew Hoffman, Eric Brochu, Nando De Freitas, et al. Portfolio allocation for Bayesian optimization. InUAI, 2011. 10

work page 2011
[19]

Vanilla Bayesian optimization performs great in high dimensions

Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla Bayesian optimization performs great in high dimensions. InProceedings of the 41st International Conference on Machine Learning, 2024

work page 2024
[20]

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, and Bharath K Sriperumbudur. Gaus- sian processes and kernel methods: A review on connections and equivalences.arXiv preprint arXiv:1807.02582, 2018

work page Pith review arXiv 2018
[21]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015

work page 2015
[22]

Yucen Lily Li, Tim G. J. Rudner, and Andrew Gordon Wilson. A study of Bayesian neural network surrogates for Bayesian optimization. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[23]

Adaptation to misspecified kernel regularity in kernelised bandits

Yusha Liu and Aarti Singh. Adaptation to misspecified kernel regularity in kernelised bandits. InInternational Conference on Artificial Intelligence and Statistics, 2023

work page 2023
[24]

Two-step machine learning enables optimized nanoparticle synthesis.npj Computational Materials, 2021

Flore Mekki-Berrada, Zekun Ren, Tan Huang, Wai Kuan Wong, Fang Zheng, Jiaxun Xie, Isaac Parker Siyu Tian, Senthilnath Jayavelu, Zackaria Mahfoud, Daniil Bash, et al. Two-step machine learning enables optimized nanoparticle synthesis.npj Computational Materials, 2021

work page 2021
[25]

Pfns4bo: In-context learning for bayesian optimization

Samuel Müller, Matthias Feurer, Noah Hollmann, and Frank Hutter. Pfns4bo: In-context learning for bayesian optimization. InInternational Conference on Machine Learning, 2023

work page 2023
[26]

A Modern Introduction to Online Learning

Francesco Orabona. A modern introduction to online learning.arXiv preprint arXiv:1912.13213, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[27]

Tabiclv2: A better, faster, scalable, and open tabular foundation model, 2026

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICLv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139, 2026

work page arXiv 2026
[28]

Rasmussen and C

C. Rasmussen and C. Williams.Gaussian Processes for Machine Learning. MIT Press, 2006

work page 2006
[29]

Kakade, and Matthias W

Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. InProceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, 2010

work page 2010
[30]

Accelerating Bayesian optimization for biological sequence design with denoising autoencoders

Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, and Andrew Gordon Wilson. Accelerating Bayesian optimization for biological sequence design with denoising autoencoders. InInternational conference on machine learning, 2022

work page 2022
[31]

Bayesian optimization with conformal prediction sets

Samuel Stanton, Wesley Maddox, and Andrew Gordon Wilson. Bayesian optimization with conformal prediction sets. InInternational Conference on Artificial Intelligence and Statistics, 2023

work page 2023
[32]

Online non-convex learning: Following the perturbed leader is optimal

Arun Sai Suggala and Praneeth Netrapalli. Online non-convex learning: Following the perturbed leader is optimal. InAlgorithmic Learning Theory, 2020

work page 2020
[33]

Conformal prediction under covariate shift.Advances in neural information processing systems, 2019

Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. Conformal prediction under covariate shift.Advances in neural information processing systems, 2019

work page 2019
[34]

High-dimensional probability.University of California, Irvine, 2020

Roman Vershynin. High-dimensional probability.University of California, Irvine, 2020

work page 2020
[35]

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior.Advances in Neural Information Processing Systems, 2018

Zi Wang, Beomjoon Kim, and Leslie P Kaelbling. Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior.Advances in Neural Information Processing Systems, 2018

work page 2018
[36]

GIT-BO: High-dimensional Bayesian optimization with tabular foundation models

Rosen Ting-Ying Yu, Cyril Picard, and Faez Ahmed. GIT-BO: High-dimensional Bayesian optimization with tabular foundation models. InThe Fourteenth International Conference on Learning Representations, 2026. 11

work page 2026
[37]

Unleashing LLMs in Bayesian optimization: Preference-guided framework for scientific discovery

Xinzhe Yuan, Zhuo Chen, Jianshu Zhang, Huan Xiong, Nanyang Ye, Yuqiang Li, and Qinying Gu. Unleashing LLMs in Bayesian optimization: Preference-guided framework for scientific discovery. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[38]

Bayesian optimisation with un- known hyperparameters: regret bounds logarithmically closer to optimal.Advances in Neural Information Processing Systems, 2024

Juliusz Ziomek, Masaki Adachi, and Michael A Osborne. Bayesian optimisation with un- known hyperparameters: regret bounds logarithmically closer to optimal.Advances in Neural Information Processing Systems, 2024. 12 Supplementary Materials The appendix is organized as follows: • Section A reports complementary experimental results referenced in the main t...

work page 2024
[39]

Proof:The strategy is to show that both Ls t and Lc t are Lipschitz continuous functions w.r.t

On the high-probability observation bound event EY (established in Theorem 6.1), uP t is a Lipschitz continuous function w.r.tℓ1 norm, i.e., |uP t (ˆθ)−u P t (θ)| ≤K∥ ˆθ−θ∥ 1, K >0 , for any ˆθ,θ∈Θ. Proof:The strategy is to show that both Ls t and Lc t are Lipschitz continuous functions w.r.t. ℓ1 norm. Then, since Lipschitz continuity is preserved under l...

work page