A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

Eric Brochu; Nando de Freitas; Vlad M. Cora

arxiv: 1012.2599 · v1 · pith:VWTVGWFHnew · submitted 2010-12-12 · 💻 cs.LG

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

Eric Brochu , Vlad M. Cora , Nando de Freitas This is my paper

classification 💻 cs.LG

keywords bayesianoptimizationfunctionareascostexpensivefunctionshierarchical

0 comments

read the original abstract

We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Finite-Time Regret Analysis of Retry-Aware Bandits
cs.LG 2026-05 unverdicted novelty 7.0

ReMax achieves the first sublinear finite-time regret bound for Gaussian bandits with M=2 by deriving an expected-improvement balance condition for its optimal sampling distribution and separating saturation from unde...
High-Throughput Bayesian Optimization of Cement-Salt Hydrates Composites for Seasonal Thermochemical Energy Storage
cond-mat.mtrl-sci 2026-05 conditional novelty 7.0

Bayesian optimization identifies cement-salt hydrate composites achieving up to five times higher specific energy than prior cement-based TCES materials, with LiCl-based formulations reaching 458 kJ/kg.
Autonomous operation of the DIAG0 diagnostic line for 6D phase-space monitoring at LCLS-II
physics.acc-ph 2026-04 unverdicted novelty 7.0

First autonomous 6D phase-space tomography system at LCLS-II achieves real-time beam reconstructions every 5-10 minutes via ML control and generative analysis.
An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions
math.OC 2026-04 conditional novelty 7.0

PALM-Mean combines sign-aware piecewise-linear relaxations of locally important kernel terms with closed-form analytic bounds on the rest inside a reduced-space branch-and-bound framework, yielding valid lower bounds ...
Human-in-the-Loop Pareto Optimization: Trade-off Characterization for Assist-as-Needed Training and Performance Evaluation
cs.RO 2026-03 unverdicted novelty 7.0

A human-in-the-loop Pareto optimization framework characterizes trade-offs between performance and challenge in assist-as-needed motor training, enabling protocol design and fair evaluation even when users cannot comp...
EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization
cs.LG 2024-10 unverdicted novelty 7.0

EARL-BO uses RL with an Attention-DeepSets encoder and end-to-end on-policy multi-task fine-tuning to approximate near-optimal multi-step lookahead policies for high-dimensional black-box optimization.
Heuristic approaches for solving a bilevel optimistic scheduling problem on parallel machines
math.OC 2026-05 unverdicted novelty 6.0

Develops RBS and MSLS heuristics exploiting follower optimality properties for bilevel uniform parallel machine scheduling with up to 500 jobs.
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
cs.AI 2026-05 unverdicted novelty 6.0

LGBO integrates LLM semantic preferences continuously into Bayesian optimization iterations, with a theoretical worst-case guarantee and empirical gains including 90% of best value in 6 iterations on a wet-lab battery task.
Active Learning MPC Objective Functions from Preferences
eess.SY 2026-05 unverdicted novelty 6.0

Active learning strategies for preference-based MPC objective learning achieve better closed-loop alignment with human preferences using fewer queries than random sampling in numerical tests.
ADKO: Agentic Decentralized Knowledge Optimization
cs.LG 2026-05 unverdicted novelty 6.0

ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
Estimating Decision Uncertainty from Preference Uncertainty: Application to Ground Vehicle Design
stat.AP 2026-04 unverdicted novelty 6.0

Preference uncertainty is modeled as random variables that induce a distribution over Pareto-optimal designs, analyzed via Sobol' indices, Shapley values, and Fréchet variance to assess decision stability in ground ve...
Vibrotactile Preference Learning: Uncertainty-Aware Preference Learning for Personalized Vibration Feedback
cs.HC 2026-04 unverdicted novelty 6.0

VPL learns individualized vibrotactile preferences efficiently via uncertainty-aware Gaussian process models and active query selection in a 13-participant user study on an Xbox controller.
Stein Variational Black-Box Combinatorial Optimization
cs.AI 2026-04 unverdicted novelty 6.0

Integrating Stein variational gradient descent into EDAs introduces repulsion among particles to jointly explore multiple optima in discrete black-box optimization, with competitive or superior results on large-scale ...
Neural Global Optimization via Iterative Refinement from Noisy Samples
cs.LG 2026-04 unverdicted novelty 6.0

A neural model learns iterative refinement from noisy samples and spline inputs to find global minima, reporting 8.05% mean error on multi-modal tests versus 36.24% for spline initialization alone.
Optimized Fish Locomotion using Design-by-Morphing and Bayesian Optimization
physics.flu-dyn 2025-09 unverdicted novelty 6.0

A morphing-based design space explored via Bayesian optimization yields swimming profiles with 49-57% peak propulsive efficiency, 16-35% above standard anguilliform and carangiform references.
Anchor-Based Heteroscedastic Noise for Preferential Bayesian Optimization
cs.LG 2024-05 unverdicted novelty 6.0

The paper introduces an anchor-based heteroscedastic noise model for PBO that maps user uncertainty via KDE on reliable examples, incorporates it into GP surrogates, and derives risk-averse acquisition functions inclu...
Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use
cs.AI 2026-05 unverdicted novelty 5.0

Trust calibration in agentic tool use is cast as preferential Bayesian optimization over a latent human risk-tolerance function observed through binary approve/deny feedback with a probit likelihood.
Bayesian Optimization of Crossbar-Based Compute-In-Memory System Design for Efficient DNN Inference
cs.ET 2026-05 unverdicted novelty 5.0

A multi-objective Bayesian optimization framework co-optimizes CIM crossbar hardware and DNN parameters for VGG8/CIFAR-10 and VGG16/Tiny-ImageNet, achieving comparable accuracy with up to 65% smaller area and 52% lowe...
Generative Augmentation of Imbalanced Flight Records for Flight Diversion Prediction: A Multi-objective Optimisation Framework
cs.LG 2026-04 unverdicted novelty 5.0

Hyperparameter-optimized generative models augment scarce flight diversion records and substantially improve prediction accuracy over real data alone.
Adaptive Compression-based Lifelong Learning
cs.CV 2019-07 unverdicted novelty 5.0

Bayesian optimization enables adaptive network pruning rates in lifelong learning, performing heavier pruning on small/simple tasks and milder on large/complex ones.
Accelerating Experimental Design by Incorporating Experimenter Hunches
stat.ML 2019-07 unverdicted novelty 5.0

A two-stage GP approach with virtual samples and posterior adjustment factors incorporates per-variable monotonic hunches into Bayesian optimization while preserving convergence guarantees, showing faster convergence ...
Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches
econ.EM 2019-07 unverdicted novelty 5.0

Adaptive GLM with MQLE and GP regression with UCB for dynamic insurance pricing, showing parameter convergence and regret analysis under delayed claims.
Training Language Models to Use Prolog as a Tool
cs.CL 2025-12 unverdicted novelty 4.0

Fine-tuning Qwen2.5-3B with GRPO on GSM8K to use Prolog yields competitive zero-shot MMLU performance but exposes an accuracy-auditability trade-off interpreted as reward hacking.
Optimization of a cosmic muon tomography scanner for cargo border control inspection
physics.ins-det 2025-07 unverdicted novelty 4.0

Optimization study for muon tomography cargo scanner using TomOpt Bayesian optimization and GEANT4 simulations to enhance border security detection.
Bayesian Optimization with Directionally Constrained Search
cs.LG 2019-06 unverdicted novelty 4.0

Introduces directionally constrained Bayesian optimization combining local and global search to improve optimum finding within evaluation limits.
A Tutorial on Bayesian Optimization
stat.ML 2018-07 unverdicted novelty 4.0

Bayesian optimization uses Gaussian process regression to build a surrogate model and acquisition functions to guide sampling for optimizing costly objective functions, including a new formal generalization of expecte...
Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects
stat.ML 2025-04 unverdicted novelty 3.0

Synthetic simulations show noise hurts needle-in-haystack optimization far more than smooth landscapes with local optima, and prior domain knowledge of noise and structure is needed for effective BO in materials research.
Automated Machine Learning in Practice: State of the Art and Recent Results
cs.LG 2019-07 unverdicted novelty 3.0

Survey of AutoML methods with benchmarks on their performance for business applications.
Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation
cs.LG 2019-06 unverdicted novelty 2.0

Applies sequential analysis and probabilistic modeling to derive stopping rules and performance measures for policy adoption in mission-critical learning environments.