arxiv: 2605.07572 · v1 · submitted 2026-05-08 · 💻 cs.AI · stat.ML

Recognition: no theorem link

Open-Ended Task Discovery via Bayesian Optimization

Masaki Adachi , Yuta Suzuki , Juliusz Ziomek

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:42 UTC · model grok-4.3

classification 💻 cs.AI stat.ML

keywords Bayesian optimizationtask discoveryopen-ended optimizationregret analysisscientific workflowsGenerate-Select-RefineLLM optimizers

0 comments

The pith

A Generate-Select-Refine loop lets Bayesian optimization discover new tasks from a seed while concentrating evaluations on the best one with only logarithmic extra regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GSR, a framework that begins with one user-supplied seed task and repeatedly generates new candidate tasks in a coarse-to-fine sequence. A separate task-acquisition function then decides how to allocate optimization effort across the growing set of tasks. The central guarantee is that, asymptotically, the method devotes nearly all evaluations to the single best task found, adding only a logarithmic regret penalty compared with running standard Bayesian optimization on that task alone. This setup is demonstrated on product development, chemical scaling, algorithm analysis, and patent work, where it beats fixed-task and existing LLM-based baselines. The approach matters because many scientific problems involve uncertainty not only about parameter values but about which objective should be optimized in the first place.

Core claim

GSR alternates task generation with scheduled optimization across the generated tasks. Starting from a seed task, it produces new tasks in a coarse-to-fine manner; a task-acquisition function then chooses which task to optimize next. Asymptotically the procedure concentrates evaluations on the single best task discovered, incurring only logarithmic regret overhead relative to ordinary single-task Bayesian optimization.

What carries the argument

The Generate-Select-Refine (GSR) loop, which generates tasks from a seed and uses a task-acquisition function to schedule Bayesian optimization across them.

If this is right

In new-product or chemical-synthesis settings the method can locate improved objectives without a separate manual task-design phase.
The regret analysis shows that discovering tasks does not force linear extra cost; only log(T) additional evaluations are needed in the limit.
Existing LLM-based optimizers are outperformed on the four application domains tested.
The framework supplies a concrete mechanism for open-ended task evolution inside the Bayesian optimization loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scheduling idea could be tested on non-Bayesian optimizers to see whether the logarithmic overhead persists.
In machine-learning practice, GSR suggests a way to let models propose and compare their own evaluation metrics rather than fixing them in advance.
If task generation is replaced by an external oracle that occasionally supplies better tasks, the regret bound would still apply and could guide resource allocation in automated science pipelines.

Load-bearing premise

The generation procedure, begun from a user seed, must keep producing useful new tasks in a coarse-to-fine order that the acquisition function can reliably rank and allocate effort toward.

What would settle it

Run GSR on a domain where every generated task is strictly worse than the seed; if the total number of evaluations needed to identify the seed as best still exceeds the logarithmic overhead bound, the concentration claim is false.

Figures

Figures reproduced from arXiv: 2605.07572 by Juliusz Ziomek, Masaki Adachi, Yuta Suzuki.

**Figure 1.** Figure 1: Conceptual overview: Starting from a seed task i0, the new tasks are iteratively generated toward the ϵ-optimal task i ⋆ and identifies its optimum x ⋆ . (a) Experimental progress: task generation and elimination over time. (b) Two-loop structure: feedback from within-task BO informs task refinement via a coarse-to-fine scheduler. Bayesian optimization (BO; [28]) is a sample-efficient black-box optimize… view at source ↗

**Figure 3.** Figure 3: LLM components in GSR: (Left) Coarse-to-fine task generation: we mutate a parent task with target mutation ratio ρm, producing refined tasks. (Right) Committee-based evaluation: committee returns binary votes if the current task–incumbent pair (i, y (i) s ) is preferred over an anchor (at, y (at) s ). By averaging them, we estimate a confidence interval for the utility u (i) (y (i) s ). Task Selection. Cor… view at source ↗

**Figure 2.** Figure 2: Alg. 1: (a) Select chooses the next task it and anchor task at. (b) Refine advances the resolution levels ϵ U m and generates J mutations when anchor width wt ≤ cgϵ U m. Task Refinement. We generate new tasks by mutating and iteratively refining a seed task. A key step is selecting an anchor task to mutate. The anchor must be both promising and sufficiently resolved. We therefore choose at as the most pr… view at source ↗

**Figure 4.** Figure 4: Real-world experiments. Top row: planning tasks—(a) new product development and (b) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: GSR’s task evolution tree shows coarse-to-fine refinement across levels m toward the best task (20). Colors indicate final utility, and evaluation counts vary across tasks; task (1) initially lags behind its children (4–6), improves with more BO evaluations, and is ultimately surpassed by task (20). We first apply GSR to planning, where one must decide both what to optimize and how to evaluate it, start… view at source ↗

**Figure 6.** Figure 6: Resolution controllability. (a) Mutation ratio ρm correlates with task distance. (b) finer task generation probability δ+ is non-zero across all resolution levels m. (c) Finer mutations yield higher δ+. (d) Coarser mutations achieve larger improvement. Patent Repurposing. In industrial R&D, materials developed for a target application are often abandoned despite substantial sunk costs; repurposing aims … view at source ↗

**Figure 7.** Figure 7: Unknown search space. Resolution controllability. We analyze the LLM experiments using the white wine task. GSR uses mutation to control task resolution. We generate J = 40 child tasks for 4 anchor tasks across mutation ratios ρm, resolution levels m, and random seeds (Fig.6). (a) Task distance—Euclidean distance over numerical task parameters—correlates with ρm, confirming that JSON mutation controls … view at source ↗

**Figure 8.** Figure 8: Offline settings: (a) Fixed task selection and (b) objective selection. Offline experiments. Finally, we consider an offline setting with no generator: all tasks are fixed a priori, reducing the problem to task selection. Algorithm 1 thus simplifies to pure task-UCB (omitting Lines 8–10). We evaluate six BoTorch benchmark functions, defining task utilities via a Gaussian CDF, u (i) (yt ) := Φ y (i) t − µ… view at source ↗

**Figure 9.** Figure 9: Log-odds chain rule: adding logits and mapping back with sigmoid offers cardinal utility [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Alg. 2: We hold multiple candidate Lipschitz bounds [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗

**Figure 11.** Figure 11: Ablation study. Replacing the mutation scheduler [PITH_FULL_IMAGE:figures/full_fig_p058_11.png] view at source ↗

read the original abstract

When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-Refine (GSR), a open-ended BO framework that alternates between task generation and task optimization. Starting from a user-provided seed task, GSR generates new tasks in a coarse-to-fine manner while a task-acquisition function schedules optimization. Asymptotically, it concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO. We apply GSR to new product development, chemical synthesis scaling, algorithm analysis, and patent repurposing, where it outperforms existing LLM-based optimizers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GSR adds an LLM-driven task generation loop to BO with some practical demos, but the log-regret claim needs a tighter model of how tasks improve.

read the letter

The main takeaway is that this paper gives Bayesian optimization a way to handle uncertainty about the task itself by generating new ones from a seed using an LLM in a coarse-to-fine loop, then scheduling optimization across them. It claims the whole thing asymptotically locks onto the best task with only logarithmic extra regret compared to plain single-task BO, and it shows results on product development, chemical scaling, algorithm analysis, and patent work where it beats some LLM baselines.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Generate-Select-Refine (GSR) framework for open-ended Bayesian optimization. Starting from a user-provided seed task, GSR alternates task generation (coarse-to-fine, LLM-driven) with optimization scheduled by a task-acquisition function. The central claim is that asymptotically GSR concentrates evaluations on the best task while incurring only logarithmic regret overhead relative to single-task BO. Empirically, GSR is applied to new product development, chemical synthesis scaling, algorithm analysis, and patent repurposing, where it outperforms existing LLM-based optimizers.

Significance. If the asymptotic regret bound can be rigorously closed and the empirical results are shown to be robust, this would represent a meaningful extension of BO to settings with evolving task definitions. The multi-domain applications illustrate potential utility in LLM-augmented scientific workflows, and the framework's structured alternation between generation and selection is a constructive contribution even if the regret analysis requires strengthening.

major comments (2)

[Abstract] Abstract: The claim that GSR 'concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO' is load-bearing for the contribution but lacks a supporting formal model. No explicit task space (finite or hierarchically structured) is defined, nor is there a proof that the generation step produces non-decreasing task quality; without this, the regret analysis cannot be closed, as the generator could indefinitely introduce incomparable or inferior tasks.
[Empirical evaluation] Empirical evaluation section: The reported outperformance over LLM-based optimizers is presented without accompanying details on the number of independent runs, variance estimates, statistical tests, or direct comparison against single-task BO (the theoretical baseline). This gap prevents assessment of whether the results support the asymptotic claim or are robust across the four listed domains.

minor comments (1)

The task-acquisition function is referenced but its precise mathematical form is not stated in the abstract; adding a short equation or pseudocode reference would improve clarity for readers unfamiliar with the scheduling mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the thoughtful review. We will revise the manuscript to strengthen the formal foundations of the regret analysis and enhance the empirical evaluation with additional statistical details and comparisons.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that GSR 'concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO' is load-bearing for the contribution but lacks a supporting formal model. No explicit task space (finite or hierarchically structured) is defined, nor is there a proof that the generation step produces non-decreasing task quality; without this, the regret analysis cannot be closed, as the generator could indefinitely introduce incomparable or inferior tasks.

Authors: We agree that a more rigorous formal model would strengthen the paper. In the revised version, we will explicitly define the task space as a tree-structured hierarchy where each generation step refines parent tasks, and provide a lemma showing that under the assumption of the LLM generator improving task quality in expectation (based on the coarse-to-fine process), the task-acquisition function ensures logarithmic regret overhead. This addresses the potential for introducing inferior tasks by incorporating a quality threshold in the generation step. revision: yes
Referee: [Empirical evaluation] Empirical evaluation section: The reported outperformance over LLM-based optimizers is presented without accompanying details on the number of independent runs, variance estimates, statistical tests, or direct comparison against single-task BO (the theoretical baseline). This gap prevents assessment of whether the results support the asymptotic claim or are robust across the four listed domains.

Authors: We will expand the empirical section to include: (i) results averaged over 20 independent runs per domain with standard error bars, (ii) Wilcoxon signed-rank tests for significance, and (iii) a direct comparison to single-task BO on the seed task, demonstrating that the additional overhead from task discovery is indeed logarithmic in the number of evaluations. These additions will confirm robustness across the domains of new product development, chemical synthesis, algorithm analysis, and patent repurposing. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines the Generate-Select-Refine (GSR) framework explicitly from a user-provided seed task and positions its asymptotic concentration claim as a theoretical comparison to single-task Bayesian optimization, which serves as an external benchmark rather than an internal input. No self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations are present in the abstract or described process. The task-acquisition scheduling and coarse-to-fine generation are introduced as independent components, and the regret overhead statement does not reduce to a tautology of the framework's own definitions. The derivation chain remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the task generation and selection process, which is introduced without detailed justification in the abstract.

axioms (1)

domain assumption Task generation from a seed task can produce meaningful variations in a coarse-to-fine manner
The framework relies on this to create new tasks for optimization.

invented entities (1)

Generate-Select-Refine (GSR) framework no independent evidence
purpose: To enable open-ended task discovery and optimization in Bayesian optimization
New framework introduced to alternate between task generation and optimization.

pith-pipeline@v0.9.0 · 5422 in / 1295 out tokens · 47852 ms · 2026-05-11T02:42:33.405363+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Regret Analysis of Guided Diffusion for Black-Box Optimization over Structured Inputs
stat.ML 2026-05 unverdicted novelty 8.0

A certificate-based regret analysis framework for guided-diffusion black-box optimization is introduced, with mass lift as the central quantity explaining convergence from pretrained generators.

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Multi- objective Bayesian optimisation with preferences over objectives

Majid Abdolshah, Alistair Shilton, Santu Rana, Sunil Gupta, and Svetha Venkatesh. Multi- objective Bayesian optimisation with preferences over objectives. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

work page 2019
[2]

The rise of self-driving labs in chemical and materials sciences.Nature Synthesis, 2(6):483–492, 2023

Milad Abolhasani and Eugenia Kumacheva. The rise of self-driving labs in chemical and materials sciences.Nature Synthesis, 2(6):483–492, 2023

work page 2023
[3]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Bayesian optimization for building social-influence-free consensus.arXiv preprint arXiv:2502.07166, 2025

Masaki Adachi, Siu Lun Chau, Wenjie Xu, Anurag Singh, Michael A Osborne, and Krikamol Muandet. Bayesian optimization for building social-influence-free consensus.arXiv preprint arXiv:2502.07166, 2025

work page arXiv 2025
[5]

Adaptive batch sizes for active learning: A probabilis- tic numerics approach

Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Xingchen Wan, Vu Nguyen, Harald Oberhauser, and Michael A Osborne. Adaptive batch sizes for active learning: A probabilis- tic numerics approach. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), pages 496–504. PMLR, 2024

work page 2024
[6]

Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, and Siu Lun Chau

Masaki Adachi, Brady Planden, David Howey, Michael A. Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, and Siu Lun Chau. Looping in the human: Collaborative and explainable Bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), volume 238, pages 505–513, 2024

work page 2024
[7]

Efficient bayesian learning curve extrapolation using prior-data fitted networks.Advances in Neural Information Processing Systems, 36:19858–19886, 2023

Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, and Frank Hutter. Efficient bayesian learning curve extrapolation using prior-data fitted networks.Advances in Neural Information Processing Systems, 36:19858–19886, 2023

work page 2023
[8]

Autodiscovery: Open-ended scientific discovery via bayesian surprise

Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, and Peter Clark. Autodiscovery: Open-ended scientific discovery via bayesian surprise. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026
[9]

Codeevolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025

Henrique Assumpção, Diego Ferreira, Leandro Campos, and Fabricio Murai. CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025

work page arXiv 2025
[10]

Bayesian optimization of composite functions

Raul Astudillo and Peter Frazier. Bayesian optimization of composite functions. InInternational Conference on Machine Learning (ICML), pages 354–363. PMLR, 2019

work page 2019
[11]

BoTorch: a framework for efficient Monte-Carlo Bayesian opti- mization.Advances in Neural Information Processing Systems (NeurIPS), 33:21524–21538, 2020

Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wilson, and Eytan Bakshy. BoTorch: a framework for efficient Monte-Carlo Bayesian opti- mization.Advances in Neural Information Processing Systems (NeurIPS), 33:21524–21538, 2020

work page 2020
[12]

No-regret Bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research (JMLR), 20(50):1–24, 2019

Felix Berkenkamp, Angela P Schoellig, and Andreas Krause. No-regret Bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research (JMLR), 20(50):1–24, 2019

work page 2019
[13]

Quality-diversity through AI feedback

Herbie Bradley, Andrew Dai, Hannah Benita Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Gregory Schott, and Joel Lehman. Quality-diversity through AI feedback. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[14]

Rank analysis of incomplete block designs: I

Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

work page 1952
[15]

Inverse optimization: Theory and applications.Operations Research, 73(2):1046–1074, 2025

Timothy CY Chan, Rafid Mahmood, and Ian Yihang Zhu. Inverse optimization: Theory and applications.Operations Research, 73(2):1046–1074, 2025. 10

work page 2025
[16]

Humans or LLMs as the judge? a study on judgement bias

Guiming Hardy Chen, Shunian Chen, Ziche Liu, Feng Jiang, and Benyou Wang. Humans or LLMs as the judge? a study on judgement bias. InConference on Empirical Methods in Natural Language Processing (EMNLP), pages 8301–8327, 2024

work page 2024
[17]

BILBO: BILevel Bayesian optimization

Ruth Wan Theng Chew, Quoc Phong Nguyen, and Bryan Kian Hsiang Low. BILBO: BILevel Bayesian optimization. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 10249–10268, 2025

work page 2025
[18]

On kernelized multi-armed bandits

Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. InInternational Conference on Machine Learning, pages 844–853. PMLR, 2017

work page 2017
[19]

Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

work page 2009
[20]

On provably robust meta-Bayesian optimization

Zhongxiang Dai, Yizhou Chen, Haibin Yu, Bryan Kian Hsiang Low, and Patrick Jaillet. On provably robust meta-Bayesian optimization. InUncertainty in Artificial Intelligence (UAI), pages 475–485. PMLR, 2022

work page 2022
[21]

Bilevel optimization by conditional bayesian optimization

Vedat Dogan and Steven Prestwich. Bilevel optimization by conditional bayesian optimization. InInternational Conference on Machine Learning, Optimization, and Data Science, pages 243–258. Springer, 2023

work page 2023
[22]

Accelerating scientific discovery with autonomous goal- evolving agents.arXiv preprint arXiv:2512.21782, 2025

Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G Rittig, Kunyang Sun, Yikun Zhang, Zhangde Song, Bo Zhou, et al. Accelerating scientific discovery with autonomous goal-evolving agents.arXiv preprint arXiv:2512.21782, 2025

work page arXiv 2025
[23]

SUMMIT: benchmarking machine learning methods for reaction optimisation.Chemistry-Methods, 1(2):116–122, 2021

Kobi C Felton, Jan G Rittig, and Alexei A Lapkin. SUMMIT: benchmarking machine learning methods for reaction optimisation.Chemistry-Methods, 1(2):116–122, 2021

work page 2021
[24]

Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, and Arber Zela. Can LLMs beat classical hyperparameter optimization algorithms? a study on autoresearch.arXiv preprint arXiv:2603.24647, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

Convergence of bayesian bilevel optimization

Shi Fu, Fengxiang He, Xinmei Tian, and Dacheng Tao. Convergence of bayesian bilevel optimization. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[26]

Scalable valuation of human feed- back through provably robust model alignment

Masahiro Fujisawa, Masaki Adachi, and Michael A Osborne. Scalable valuation of human feed- back through provably robust model alignment. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[27]

GPyTorch: Blackbox matrix-matrix gaussian process inference with GPU acceleration

Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. GPyTorch: Blackbox matrix-matrix gaussian process inference with GPU acceleration. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

work page 2018
[28]

Cambridge University Press, 2023

Roman Garnett.Bayesian optimization. Cambridge University Press, 2023

work page 2023
[29]

Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

work page 2018
[30]

Preferential Bayesian optimization

Javier González, Zhenwen Dai, Andreas Damianou, and Neil D Lawrence. Preferential Bayesian optimization. InInternational Conference on Machine Learning (ICML), pages 1282–1291. PMLR, 2017

work page 2017
[31]

LLMs for Bayesian optimization in scientific domains: Are we there yet? InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 15482–15510

Rushil Gupta, Jason Hartford, and Bang Liu. LLMs for Bayesian optimization in scientific domains: Are we there yet? InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 15482–15510. Association for Computational Linguistics, 2025

work page 2025
[32]

Sub-linear regret bounds for Bayesian optimisation in unknown search spaces

Sunil Gupta, Santu Rana, Huong Ha, and Svetha Venkatesh. Sub-linear regret bounds for Bayesian optimisation in unknown search spaces. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 16271–16281, 2020. 11

work page 2020
[33]

Bayesian opti- mization with unknown search space

Huong Ha, Santu Rana, Sunil Gupta, Thanh Nguyen, and Svetha Venkatesh. Bayesian opti- mization with unknown search space. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

work page 2019
[34]

The inorganic crystal structure database (icsd)—present and future

Mariette Hellenbrandt. The inorganic crystal structure database (icsd)—present and future. Crystallography Reviews, 10(1):17–22, 2004

work page 2004
[35]

An optimization-based algorithm for non- stationary kernel bandits without prior knowledge

Kihyuk Hong, Yuhang Li, and Ambuj Tewari. An optimization-based algorithm for non- stationary kernel bandits without prior knowledge. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), pages 3048–3085. PMLR, 2023

work page 2023
[36]

Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

work page 2013
[37]

Non-stochastic best arm identification and hyperparame- ter optimization

Kevin Jamieson and Ameet Talwalkar. Non-stochastic best arm identification and hyperparame- ter optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), pages 240–248. PMLR, 2016

work page 2016
[38]

Preference exploration for efficient Bayesian optimization with multiple outcomes

Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration for efficient Bayesian optimization with multiple outcomes. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), volume 151, pages 4235–4258, 2022

work page 2022
[39]

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets. InProceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54, pages 528–536, 2017

work page 2017
[40]

T., Imajuku, Y., and Cetin, E

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. ShinkaEvolve: Towards open-ended and sample-efficient program evolution.arXiv preprint arXiv:2509.19349, 2025

work page arXiv 2025
[41]

How to correctly report llm-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025

Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, and Kangwook Lee. How to correctly report LLM-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025

work page arXiv 2025
[42]

Consequences of kernel regularity for bandit optimization.arXiv preprint arXiv:2512.05957, 2025

Madison Lee and Tara Javidi. Consequences of kernel regularity for bandit optimization.arXiv preprint arXiv:2512.05957, 2025

work page arXiv 2025
[43]

Cambridge university press, 2009

Raphael D Levine.Molecular reaction dynamics. Cambridge university press, 2009

work page 2009
[44]

Hy- perband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research (JMLR), 18(185):1–52, 2018

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hy- perband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research (JMLR), 18(185):1–52, 2018

work page 2018
[45]

Benchmarking the perfor- mance of Bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 7(1):188, 2021

Qiaohao Liang, Aldair E Gongora, Zekun Ren, Armi Tiihonen, Zhe Liu, Shijing Sun, James R Deneault, Daniil Bash, Flore Mekki-Berrada, Saif A Khan, et al. Benchmarking the perfor- mance of Bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 7(1):188, 2021

work page 2021
[46]

Large language models to enhance Bayesian optimization

Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance Bayesian optimization. InInternational Conference on Learning Represen- tations (ICLR), 2024

work page 2024
[47]

End-to- end meta-Bayesian optimisation with transformer neural processes

Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to- end meta-Bayesian optimisation with transformer neural processes. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 11246–11260, 2023

work page 2023
[48]

The application of Bayesian methods for seeking the extremum.Towards global optimization, 2(117-129):2, 1978

Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum.Towards global optimization, 2(117-129):2, 1978

work page 1978
[49]

Bayesian optimization for iterative learning.Advances in Neural Information Processing Systems, 33:9361–9371, 2020

Vu Nguyen, Sebastian Schulze, and Michael Osborne. Bayesian optimization for iterative learning.Advances in Neural Information Processing Systems, 33:9361–9371, 2020. 12

work page 2020
[50]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

Natural evolutionary search meets probabilistic numerics.arXiv preprint arXiv:2507.07288, 2025

Pierre Osselin, Masaki Adachi, Xiaowen Dong, and Michael A Osborne. Natural evolutionary search meets probabilistic numerics.arXiv preprint arXiv:2507.07288, 2025

work page arXiv 2025
[52]

Giant tunnelling magnetoresistance at room temperature with MgO (100) tunnel barriers.Nature Materials, 3(12):862–867, 2004

Stuart SP Parkin, Christian Kaiser, Alex Panchula, Philip M Rice, Brian Hughes, Mahesh Samant, and See-Hun Yang. Giant tunnelling magnetoresistance at room temperature with MgO (100) tunnel barriers.Nature Materials, 3(12):862–867, 2004

work page 2004
[53]

Bayesian optimization for accelerated drug discovery.IBM Journal of Research and Development, 62(6):2–1, 2018

Edward O Pyzer-Knapp. Bayesian optimization for accelerated drug discovery.IBM Journal of Research and Development, 62(6):2–1, 2018

work page 2018
[54]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 53728–53741, 2023

work page 2023
[55]

In-context freeze-thaw Bayesian optimization for hyperparameter optimiza- tion

Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, Samir Garibov, Eddie Bergman, and Frank Hutter. In-context freeze-thaw Bayesian optimization for hyperparameter optimiza- tion. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 41982–42008, 2024

work page 2024
[56]

Turner, and David Duvenaud

James Requeima, John Bronskill, Dami Choi, Richard E. Turner, and David Duvenaud. LLM processes: Numerical predictive distributions conditioned on natural language. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 109609–109671, 2024

work page 2024
[57]

Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

work page 2015
[58]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[59]

Gaussian process op- timization in the bandit setting: No regret and experimental design

Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. Gaussian process op- timization in the bandit setting: No regret and experimental design. InInternational Conference on Machine Learning (ICML), pages 1015–1022, 2010

work page 2010
[60]

Information- theoretic regret bounds for Gaussian process optimization in the bandit setting.IEEE transac- tions on information theory, 58(5):3250–3265, 2012

Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. Information- theoretic regret bounds for Gaussian process optimization in the bandit setting.IEEE transac- tions on information theory, 58(5):3250–3265, 2012

work page 2012
[61]

Springer, 2015

Kenneth O Stanley and Joel Lehman.Why greatness cannot be planned: The myth of the objective. Springer, 2015

work page 2015
[62]

Springer Science & Business Media, 1999

Michael L Stein.Interpolation of spatial data. Springer Science & Business Media, 1999

work page 1999
[63]

Adaptive kernel design for Bayesian optimization is a piece of CAKE with LLMs

Richard Cornelius Suwandi, Feng Yin, Juntao Wang, Renjie Li, Tsung-Hui Chang, and Sergios Theodoridis. Adaptive kernel design for Bayesian optimization is a piece of CAKE with LLMs. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[64]

Multi-task Bayesian optimization

Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task Bayesian optimization. In Advances in Neural Information Processing Systems (NeurIPS), volume 26, 2013

work page 2013
[65]

Freeze-thaw bayesian optimization

Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. Freeze-thaw bayesian optimization. arXiv preprint arXiv:1406.3896, 2014

work page arXiv 2014
[66]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023. 13

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

Fröhlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, and Christian Daniel

Michael V olpp, Lukas P. Fröhlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, and Christian Daniel. Meta-learning acquisition functions for transfer learning in Bayesian optimization. InInternational Conference on Learning Representations (ICLR), 2020

work page 2020
[68]

In-context learning is provably bayesian inference: a generalization theory for meta-learning.arXiv preprint arXiv:2510.10981, 2025

Tomoya Wakayama and Taiji Suzuki. In-context learning is provably bayesian inference: a generalization theory for meta-learning.arXiv preprint arXiv:2510.10981, 2025

work page arXiv 2025
[69]

Max-value entropy search for efficient Bayesian optimization

Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient Bayesian optimization. InInternational Conference on Machine Learning (ICML), pages 3627–3635. PMLR, 2017

work page 2017
[70]

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior.Advances in Neural Information Processing Systems (NeurIPS), 31, 2018

Zi Wang, Beomjoon Kim, and Leslie P Kaelbling. Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior.Advances in Neural Information Processing Systems (NeurIPS), 31, 2018

work page 2018
[71]

MIT press Cambridge, MA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning. MIT press Cambridge, MA, 2006

work page 2006
[72]

How many pretraining tasks are needed for in-context learning of linear regression? InThe Twelfth International Conference on Learning Representations, 2024

Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, and Peter Bartlett. How many pretraining tasks are needed for in-context learning of linear regression? InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[73]

Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.Physical review letters, 120(14):145301, 2018

Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.Physical review letters, 120(14):145301, 2018

work page 2018
[74]

Jones, and Michael A

Wenjie Xu, Masaki Adachi, Colin N. Jones, and Michael A. Osborne. Principled Bayesian optimization in collaboration with human experts. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 104091–104137, 2024

work page 2024
[75]

Verbalizing LLM’s higher-order uncertainty via imprecise probabilities.arXiv preprint arXiv:2603.10396, 2026

Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, and Masaki Adachi. Verbalizing llm’s higher-order uncertainty via imprecise probabilities.arXiv preprint arXiv:2603.10396, 2026

work page arXiv 2026
[76]

Self-taught optimizer (STOP): Recursively self-improving code generation

Eric Zelikman, Eliana Lorch, Lester Mackey, and Adam Tauman Kalai. Self-taught optimizer (STOP): Recursively self-improving code generation. InConference on Language Modeling (COLM), 2024

work page 2024
[77]

Darwin godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin Gödel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

work page arXiv 2025
[78]

OMNI: Open-endedness via models of human notions of interestingness

Jenny Zhang, Joel Lehman, Kenneth Stanley, and Jeff Clune. OMNI: Open-endedness via models of human notions of interestingness. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[79]

Juliusz Ziomek, Masaki Adachi, and Michael A Osborne. Bayesian optimisation with un- known hyperparameters: regret bounds logarithmically closer to optimal.Advances in Neural Information Processing Systems (NeurIPS), 37:86346–86374, 2024

work page 2024
[80]

resolution

Juliusz Ziomek, Masaki Adachi, and Michael A Osborne. Time-varying Gaussian process bandits with unknown prior. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), volume 258, pages 4294–4302, 2025. 14 A Notations Notations are summarized in Table 2. B Preliminary B.1 Gaussian process regression background Gaussian process mode...

work page 2025

Showing first 80 references.