pith. machine review for the scientific record. sign in

arxiv: 2605.07572 · v1 · submitted 2026-05-08 · 💻 cs.AI · stat.ML

Recognition: no theorem link

Open-Ended Task Discovery via Bayesian Optimization

Masaki Adachi , Yuta Suzuki , Juliusz Ziomek

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:42 UTC · model grok-4.3

classification 💻 cs.AI stat.ML
keywords Bayesian optimizationtask discoveryopen-ended optimizationregret analysisscientific workflowsGenerate-Select-RefineLLM optimizers
0
0 comments X

The pith

A Generate-Select-Refine loop lets Bayesian optimization discover new tasks from a seed while concentrating evaluations on the best one with only logarithmic extra regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GSR, a framework that begins with one user-supplied seed task and repeatedly generates new candidate tasks in a coarse-to-fine sequence. A separate task-acquisition function then decides how to allocate optimization effort across the growing set of tasks. The central guarantee is that, asymptotically, the method devotes nearly all evaluations to the single best task found, adding only a logarithmic regret penalty compared with running standard Bayesian optimization on that task alone. This setup is demonstrated on product development, chemical scaling, algorithm analysis, and patent work, where it beats fixed-task and existing LLM-based baselines. The approach matters because many scientific problems involve uncertainty not only about parameter values but about which objective should be optimized in the first place.

Core claim

GSR alternates task generation with scheduled optimization across the generated tasks. Starting from a seed task, it produces new tasks in a coarse-to-fine manner; a task-acquisition function then chooses which task to optimize next. Asymptotically the procedure concentrates evaluations on the single best task discovered, incurring only logarithmic regret overhead relative to ordinary single-task Bayesian optimization.

What carries the argument

The Generate-Select-Refine (GSR) loop, which generates tasks from a seed and uses a task-acquisition function to schedule Bayesian optimization across them.

If this is right

  • In new-product or chemical-synthesis settings the method can locate improved objectives without a separate manual task-design phase.
  • The regret analysis shows that discovering tasks does not force linear extra cost; only log(T) additional evaluations are needed in the limit.
  • Existing LLM-based optimizers are outperformed on the four application domains tested.
  • The framework supplies a concrete mechanism for open-ended task evolution inside the Bayesian optimization loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same scheduling idea could be tested on non-Bayesian optimizers to see whether the logarithmic overhead persists.
  • In machine-learning practice, GSR suggests a way to let models propose and compare their own evaluation metrics rather than fixing them in advance.
  • If task generation is replaced by an external oracle that occasionally supplies better tasks, the regret bound would still apply and could guide resource allocation in automated science pipelines.

Load-bearing premise

The generation procedure, begun from a user seed, must keep producing useful new tasks in a coarse-to-fine order that the acquisition function can reliably rank and allocate effort toward.

What would settle it

Run GSR on a domain where every generated task is strictly worse than the seed; if the total number of evaluations needed to identify the seed as best still exceeds the logarithmic overhead bound, the concentration claim is false.

Figures

Figures reproduced from arXiv: 2605.07572 by Juliusz Ziomek, Masaki Adachi, Yuta Suzuki.

Figure 1
Figure 1. Figure 1: Conceptual overview: Starting from a seed task i0, the new tasks are iter￾atively generated toward the ϵ-optimal task i ⋆ and identifies its optimum x ⋆ . (a) Exper￾imental progress: task generation and elim￾ination over time. (b) Two-loop structure: feedback from within-task BO informs task refinement via a coarse-to-fine scheduler. Bayesian optimization (BO; [28]) is a sample-efficient black-box optimize… view at source ↗
Figure 3
Figure 3. Figure 3: LLM components in GSR: (Left) Coarse-to-fine task generation: we mutate a parent task with target mutation ratio ρm, producing refined tasks. (Right) Committee-based evaluation: committee returns binary votes if the current task–incumbent pair (i, y (i) s ) is preferred over an anchor (at, y (at) s ). By averaging them, we estimate a confidence interval for the utility u (i) (y (i) s ). Task Selection. Cor… view at source ↗
Figure 2
Figure 2. Figure 2: Alg. 1: (a) Select chooses the next task it and anchor task at. (b) Refine ad￾vances the resolution levels ϵ U m and generates J mutations when anchor width wt ≤ cgϵ U m. Task Refinement. We generate new tasks by mutat￾ing and iteratively refining a seed task. A key step is selecting an anchor task to mutate. The anchor must be both promising and sufficiently resolved. We therefore choose at as the most pr… view at source ↗
Figure 4
Figure 4. Figure 4: Real-world experiments. Top row: planning tasks—(a) new product development and (b) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: GSR’s task evolution tree shows coarse-to-fine refine￾ment across levels m toward the best task (20). Colors indicate fi￾nal utility, and evaluation counts vary across tasks; task (1) ini￾tially lags behind its children (4–6), improves with more BO evaluations, and is ultimately surpassed by task (20). We first apply GSR to planning, where one must decide both what to optimize and how to evaluate it, start… view at source ↗
Figure 6
Figure 6. Figure 6: Resolution controllability. (a) Mutation ratio ρm correlates with task distance. (b) finer task generation prob￾ability δ+ is non-zero across all reso￾lution levels m. (c) Finer mutations yield higher δ+. (d) Coarser mutations achieve larger improvement. Patent Repurposing. In industrial R&D, materials devel￾oped for a target application are often abandoned despite substantial sunk costs; repurposing aims … view at source ↗
Figure 7
Figure 7. Figure 7: Unknown search space. Resolution controllability. We analyze the LLM exper￾iments using the white wine task. GSR uses mutation to control task resolution. We generate J = 40 child tasks for 4 anchor tasks across mutation ratios ρm, reso￾lution levels m, and random seeds (Fig.6). (a) Task dis￾tance—Euclidean distance over numerical task parame￾ters—correlates with ρm, confirming that JSON mutation controls … view at source ↗
Figure 8
Figure 8. Figure 8: Offline settings: (a) Fixed task selection and (b) objective selection. Offline experiments. Finally, we consider an offline set￾ting with no generator: all tasks are fixed a priori, reducing the problem to task selection. Algorithm 1 thus simplifies to pure task-UCB (omitting Lines 8–10). We evaluate six BoTorch benchmark functions, defining task utilities via a Gaussian CDF, u (i) (yt ) := Φ y (i) t − µ… view at source ↗
Figure 9
Figure 9. Figure 9: Log-odds chain rule: adding logits and mapping back with sigmoid offers cardinal utility [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Alg. 2: We hold multiple candidate Lipschitz bounds [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Ablation study. Replacing the mutation scheduler [PITH_FULL_IMAGE:figures/full_fig_p058_11.png] view at source ↗
read the original abstract

When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-Refine (GSR), a open-ended BO framework that alternates between task generation and task optimization. Starting from a user-provided seed task, GSR generates new tasks in a coarse-to-fine manner while a task-acquisition function schedules optimization. Asymptotically, it concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO. We apply GSR to new product development, chemical synthesis scaling, algorithm analysis, and patent repurposing, where it outperforms existing LLM-based optimizers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Generate-Select-Refine (GSR) framework for open-ended Bayesian optimization. Starting from a user-provided seed task, GSR alternates task generation (coarse-to-fine, LLM-driven) with optimization scheduled by a task-acquisition function. The central claim is that asymptotically GSR concentrates evaluations on the best task while incurring only logarithmic regret overhead relative to single-task BO. Empirically, GSR is applied to new product development, chemical synthesis scaling, algorithm analysis, and patent repurposing, where it outperforms existing LLM-based optimizers.

Significance. If the asymptotic regret bound can be rigorously closed and the empirical results are shown to be robust, this would represent a meaningful extension of BO to settings with evolving task definitions. The multi-domain applications illustrate potential utility in LLM-augmented scientific workflows, and the framework's structured alternation between generation and selection is a constructive contribution even if the regret analysis requires strengthening.

major comments (2)
  1. [Abstract] Abstract: The claim that GSR 'concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO' is load-bearing for the contribution but lacks a supporting formal model. No explicit task space (finite or hierarchically structured) is defined, nor is there a proof that the generation step produces non-decreasing task quality; without this, the regret analysis cannot be closed, as the generator could indefinitely introduce incomparable or inferior tasks.
  2. [Empirical evaluation] Empirical evaluation section: The reported outperformance over LLM-based optimizers is presented without accompanying details on the number of independent runs, variance estimates, statistical tests, or direct comparison against single-task BO (the theoretical baseline). This gap prevents assessment of whether the results support the asymptotic claim or are robust across the four listed domains.
minor comments (1)
  1. The task-acquisition function is referenced but its precise mathematical form is not stated in the abstract; adding a short equation or pseudocode reference would improve clarity for readers unfamiliar with the scheduling mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the thoughtful review. We will revise the manuscript to strengthen the formal foundations of the regret analysis and enhance the empirical evaluation with additional statistical details and comparisons.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that GSR 'concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO' is load-bearing for the contribution but lacks a supporting formal model. No explicit task space (finite or hierarchically structured) is defined, nor is there a proof that the generation step produces non-decreasing task quality; without this, the regret analysis cannot be closed, as the generator could indefinitely introduce incomparable or inferior tasks.

    Authors: We agree that a more rigorous formal model would strengthen the paper. In the revised version, we will explicitly define the task space as a tree-structured hierarchy where each generation step refines parent tasks, and provide a lemma showing that under the assumption of the LLM generator improving task quality in expectation (based on the coarse-to-fine process), the task-acquisition function ensures logarithmic regret overhead. This addresses the potential for introducing inferior tasks by incorporating a quality threshold in the generation step. revision: yes

  2. Referee: [Empirical evaluation] Empirical evaluation section: The reported outperformance over LLM-based optimizers is presented without accompanying details on the number of independent runs, variance estimates, statistical tests, or direct comparison against single-task BO (the theoretical baseline). This gap prevents assessment of whether the results support the asymptotic claim or are robust across the four listed domains.

    Authors: We will expand the empirical section to include: (i) results averaged over 20 independent runs per domain with standard error bars, (ii) Wilcoxon signed-rank tests for significance, and (iii) a direct comparison to single-task BO on the seed task, demonstrating that the additional overhead from task discovery is indeed logarithmic in the number of evaluations. These additions will confirm robustness across the domains of new product development, chemical synthesis, algorithm analysis, and patent repurposing. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines the Generate-Select-Refine (GSR) framework explicitly from a user-provided seed task and positions its asymptotic concentration claim as a theoretical comparison to single-task Bayesian optimization, which serves as an external benchmark rather than an internal input. No self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations are present in the abstract or described process. The task-acquisition scheduling and coarse-to-fine generation are introduced as independent components, and the regret overhead statement does not reduce to a tautology of the framework's own definitions. The derivation chain remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the task generation and selection process, which is introduced without detailed justification in the abstract.

axioms (1)
  • domain assumption Task generation from a seed task can produce meaningful variations in a coarse-to-fine manner
    The framework relies on this to create new tasks for optimization.
invented entities (1)
  • Generate-Select-Refine (GSR) framework no independent evidence
    purpose: To enable open-ended task discovery and optimization in Bayesian optimization
    New framework introduced to alternate between task generation and optimization.

pith-pipeline@v0.9.0 · 5422 in / 1295 out tokens · 47852 ms · 2026-05-11T02:42:33.405363+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Regret Analysis of Guided Diffusion for Black-Box Optimization over Structured Inputs

    stat.ML 2026-05 unverdicted novelty 8.0

    A certificate-based regret analysis framework for guided-diffusion black-box optimization is introduced, with mass lift as the central quantity explaining convergence from pretrained generators.

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Multi- objective Bayesian optimisation with preferences over objectives

    Majid Abdolshah, Alistair Shilton, Santu Rana, Sunil Gupta, and Svetha Venkatesh. Multi- objective Bayesian optimisation with preferences over objectives. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

  2. [2]

    The rise of self-driving labs in chemical and materials sciences.Nature Synthesis, 2(6):483–492, 2023

    Milad Abolhasani and Eugenia Kumacheva. The rise of self-driving labs in chemical and materials sciences.Nature Synthesis, 2(6):483–492, 2023

  3. [3]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  4. [4]

    Bayesian optimization for building social-influence-free consensus.arXiv preprint arXiv:2502.07166, 2025

    Masaki Adachi, Siu Lun Chau, Wenjie Xu, Anurag Singh, Michael A Osborne, and Krikamol Muandet. Bayesian optimization for building social-influence-free consensus.arXiv preprint arXiv:2502.07166, 2025

  5. [5]

    Adaptive batch sizes for active learning: A probabilis- tic numerics approach

    Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Xingchen Wan, Vu Nguyen, Harald Oberhauser, and Michael A Osborne. Adaptive batch sizes for active learning: A probabilis- tic numerics approach. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), pages 496–504. PMLR, 2024

  6. [6]

    Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, and Siu Lun Chau

    Masaki Adachi, Brady Planden, David Howey, Michael A. Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, and Siu Lun Chau. Looping in the human: Collaborative and explainable Bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), volume 238, pages 505–513, 2024

  7. [7]

    Efficient bayesian learning curve extrapolation using prior-data fitted networks.Advances in Neural Information Processing Systems, 36:19858–19886, 2023

    Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, and Frank Hutter. Efficient bayesian learning curve extrapolation using prior-data fitted networks.Advances in Neural Information Processing Systems, 36:19858–19886, 2023

  8. [8]

    Autodiscovery: Open-ended scientific discovery via bayesian surprise

    Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, and Peter Clark. Autodiscovery: Open-ended scientific discovery via bayesian surprise. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

  9. [9]

    Codeevolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025

    Henrique Assumpção, Diego Ferreira, Leandro Campos, and Fabricio Murai. CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025

  10. [10]

    Bayesian optimization of composite functions

    Raul Astudillo and Peter Frazier. Bayesian optimization of composite functions. InInternational Conference on Machine Learning (ICML), pages 354–363. PMLR, 2019

  11. [11]

    BoTorch: a framework for efficient Monte-Carlo Bayesian opti- mization.Advances in Neural Information Processing Systems (NeurIPS), 33:21524–21538, 2020

    Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wilson, and Eytan Bakshy. BoTorch: a framework for efficient Monte-Carlo Bayesian opti- mization.Advances in Neural Information Processing Systems (NeurIPS), 33:21524–21538, 2020

  12. [12]

    No-regret Bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research (JMLR), 20(50):1–24, 2019

    Felix Berkenkamp, Angela P Schoellig, and Andreas Krause. No-regret Bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research (JMLR), 20(50):1–24, 2019

  13. [13]

    Quality-diversity through AI feedback

    Herbie Bradley, Andrew Dai, Hannah Benita Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Gregory Schott, and Joel Lehman. Quality-diversity through AI feedback. InThe Twelfth International Conference on Learning Representations, 2024

  14. [14]

    Rank analysis of incomplete block designs: I

    Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

  15. [15]

    Inverse optimization: Theory and applications.Operations Research, 73(2):1046–1074, 2025

    Timothy CY Chan, Rafid Mahmood, and Ian Yihang Zhu. Inverse optimization: Theory and applications.Operations Research, 73(2):1046–1074, 2025. 10

  16. [16]

    Humans or LLMs as the judge? a study on judgement bias

    Guiming Hardy Chen, Shunian Chen, Ziche Liu, Feng Jiang, and Benyou Wang. Humans or LLMs as the judge? a study on judgement bias. InConference on Empirical Methods in Natural Language Processing (EMNLP), pages 8301–8327, 2024

  17. [17]

    BILBO: BILevel Bayesian optimization

    Ruth Wan Theng Chew, Quoc Phong Nguyen, and Bryan Kian Hsiang Low. BILBO: BILevel Bayesian optimization. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 10249–10268, 2025

  18. [18]

    On kernelized multi-armed bandits

    Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. InInternational Conference on Machine Learning, pages 844–853. PMLR, 2017

  19. [19]

    Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

    Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties.Decision support systems, 47(4):547–553, 2009

  20. [20]

    On provably robust meta-Bayesian optimization

    Zhongxiang Dai, Yizhou Chen, Haibin Yu, Bryan Kian Hsiang Low, and Patrick Jaillet. On provably robust meta-Bayesian optimization. InUncertainty in Artificial Intelligence (UAI), pages 475–485. PMLR, 2022

  21. [21]

    Bilevel optimization by conditional bayesian optimization

    Vedat Dogan and Steven Prestwich. Bilevel optimization by conditional bayesian optimization. InInternational Conference on Machine Learning, Optimization, and Data Science, pages 243–258. Springer, 2023

  22. [22]

    Accelerating scientific discovery with autonomous goal- evolving agents.arXiv preprint arXiv:2512.21782, 2025

    Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G Rittig, Kunyang Sun, Yikun Zhang, Zhangde Song, Bo Zhou, et al. Accelerating scientific discovery with autonomous goal-evolving agents.arXiv preprint arXiv:2512.21782, 2025

  23. [23]

    SUMMIT: benchmarking machine learning methods for reaction optimisation.Chemistry-Methods, 1(2):116–122, 2021

    Kobi C Felton, Jan G Rittig, and Alexei A Lapkin. SUMMIT: benchmarking machine learning methods for reaction optimisation.Chemistry-Methods, 1(2):116–122, 2021

  24. [24]

    Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

    Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, and Arber Zela. Can LLMs beat classical hyperparameter optimization algorithms? a study on autoresearch.arXiv preprint arXiv:2603.24647, 2026

  25. [25]

    Convergence of bayesian bilevel optimization

    Shi Fu, Fengxiang He, Xinmei Tian, and Dacheng Tao. Convergence of bayesian bilevel optimization. InThe Twelfth International Conference on Learning Representations, 2024

  26. [26]

    Scalable valuation of human feed- back through provably robust model alignment

    Masahiro Fujisawa, Masaki Adachi, and Michael A Osborne. Scalable valuation of human feed- back through provably robust model alignment. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  27. [27]

    GPyTorch: Blackbox matrix-matrix gaussian process inference with GPU acceleration

    Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. GPyTorch: Blackbox matrix-matrix gaussian process inference with GPU acceleration. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

  28. [28]

    Cambridge University Press, 2023

    Roman Garnett.Bayesian optimization. Cambridge University Press, 2023

  29. [29]

    Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

    Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

  30. [30]

    Preferential Bayesian optimization

    Javier González, Zhenwen Dai, Andreas Damianou, and Neil D Lawrence. Preferential Bayesian optimization. InInternational Conference on Machine Learning (ICML), pages 1282–1291. PMLR, 2017

  31. [31]

    LLMs for Bayesian optimization in scientific domains: Are we there yet? InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 15482–15510

    Rushil Gupta, Jason Hartford, and Bang Liu. LLMs for Bayesian optimization in scientific domains: Are we there yet? InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 15482–15510. Association for Computational Linguistics, 2025

  32. [32]

    Sub-linear regret bounds for Bayesian optimisation in unknown search spaces

    Sunil Gupta, Santu Rana, Huong Ha, and Svetha Venkatesh. Sub-linear regret bounds for Bayesian optimisation in unknown search spaces. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 16271–16281, 2020. 11

  33. [33]

    Bayesian opti- mization with unknown search space

    Huong Ha, Santu Rana, Sunil Gupta, Thanh Nguyen, and Svetha Venkatesh. Bayesian opti- mization with unknown search space. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

  34. [34]

    The inorganic crystal structure database (icsd)—present and future

    Mariette Hellenbrandt. The inorganic crystal structure database (icsd)—present and future. Crystallography Reviews, 10(1):17–22, 2004

  35. [35]

    An optimization-based algorithm for non- stationary kernel bandits without prior knowledge

    Kihyuk Hong, Yuhang Li, and Ambuj Tewari. An optimization-based algorithm for non- stationary kernel bandits without prior knowledge. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), pages 3048–3085. PMLR, 2023

  36. [36]

    Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

    Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

  37. [37]

    Non-stochastic best arm identification and hyperparame- ter optimization

    Kevin Jamieson and Ameet Talwalkar. Non-stochastic best arm identification and hyperparame- ter optimization. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), pages 240–248. PMLR, 2016

  38. [38]

    Preference exploration for efficient Bayesian optimization with multiple outcomes

    Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration for efficient Bayesian optimization with multiple outcomes. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), volume 151, pages 4235–4258, 2022

  39. [39]

    Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

    Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets. InProceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54, pages 528–536, 2017

  40. [40]

    T., Imajuku, Y., and Cetin, E

    Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. ShinkaEvolve: Towards open-ended and sample-efficient program evolution.arXiv preprint arXiv:2509.19349, 2025

  41. [41]

    How to correctly report llm-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025

    Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, and Kangwook Lee. How to correctly report LLM-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025

  42. [42]

    Consequences of kernel regularity for bandit optimization.arXiv preprint arXiv:2512.05957, 2025

    Madison Lee and Tara Javidi. Consequences of kernel regularity for bandit optimization.arXiv preprint arXiv:2512.05957, 2025

  43. [43]

    Cambridge university press, 2009

    Raphael D Levine.Molecular reaction dynamics. Cambridge university press, 2009

  44. [44]

    Hy- perband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research (JMLR), 18(185):1–52, 2018

    Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hy- perband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research (JMLR), 18(185):1–52, 2018

  45. [45]

    Benchmarking the perfor- mance of Bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 7(1):188, 2021

    Qiaohao Liang, Aldair E Gongora, Zekun Ren, Armi Tiihonen, Zhe Liu, Shijing Sun, James R Deneault, Daniil Bash, Flore Mekki-Berrada, Saif A Khan, et al. Benchmarking the perfor- mance of Bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 7(1):188, 2021

  46. [46]

    Large language models to enhance Bayesian optimization

    Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance Bayesian optimization. InInternational Conference on Learning Represen- tations (ICLR), 2024

  47. [47]

    End-to- end meta-Bayesian optimisation with transformer neural processes

    Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to- end meta-Bayesian optimisation with transformer neural processes. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 11246–11260, 2023

  48. [48]

    The application of Bayesian methods for seeking the extremum.Towards global optimization, 2(117-129):2, 1978

    Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum.Towards global optimization, 2(117-129):2, 1978

  49. [49]

    Bayesian optimization for iterative learning.Advances in Neural Information Processing Systems, 33:9361–9371, 2020

    Vu Nguyen, Sebastian Schulze, and Michael Osborne. Bayesian optimization for iterative learning.Advances in Neural Information Processing Systems, 33:9361–9371, 2020. 12

  50. [50]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

  51. [51]

    Natural evolutionary search meets probabilistic numerics.arXiv preprint arXiv:2507.07288, 2025

    Pierre Osselin, Masaki Adachi, Xiaowen Dong, and Michael A Osborne. Natural evolutionary search meets probabilistic numerics.arXiv preprint arXiv:2507.07288, 2025

  52. [52]

    Giant tunnelling magnetoresistance at room temperature with MgO (100) tunnel barriers.Nature Materials, 3(12):862–867, 2004

    Stuart SP Parkin, Christian Kaiser, Alex Panchula, Philip M Rice, Brian Hughes, Mahesh Samant, and See-Hun Yang. Giant tunnelling magnetoresistance at room temperature with MgO (100) tunnel barriers.Nature Materials, 3(12):862–867, 2004

  53. [53]

    Bayesian optimization for accelerated drug discovery.IBM Journal of Research and Development, 62(6):2–1, 2018

    Edward O Pyzer-Knapp. Bayesian optimization for accelerated drug discovery.IBM Journal of Research and Development, 62(6):2–1, 2018

  54. [54]

    Direct preference optimization: Your language model is secretly a reward model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 53728–53741, 2023

  55. [55]

    In-context freeze-thaw Bayesian optimization for hyperparameter optimiza- tion

    Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, Samir Garibov, Eddie Bergman, and Frank Hutter. In-context freeze-thaw Bayesian optimization for hyperparameter optimiza- tion. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 41982–42008, 2024

  56. [56]

    Turner, and David Duvenaud

    James Requeima, John Bronskill, Dami Choi, Richard E. Turner, and David Duvenaud. LLM processes: Numerical predictive distributions conditioned on natural language. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 109609–109671, 2024

  57. [57]

    Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

    Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

  58. [58]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  59. [59]

    Gaussian process op- timization in the bandit setting: No regret and experimental design

    Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. Gaussian process op- timization in the bandit setting: No regret and experimental design. InInternational Conference on Machine Learning (ICML), pages 1015–1022, 2010

  60. [60]

    Information- theoretic regret bounds for Gaussian process optimization in the bandit setting.IEEE transac- tions on information theory, 58(5):3250–3265, 2012

    Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. Information- theoretic regret bounds for Gaussian process optimization in the bandit setting.IEEE transac- tions on information theory, 58(5):3250–3265, 2012

  61. [61]

    Springer, 2015

    Kenneth O Stanley and Joel Lehman.Why greatness cannot be planned: The myth of the objective. Springer, 2015

  62. [62]

    Springer Science & Business Media, 1999

    Michael L Stein.Interpolation of spatial data. Springer Science & Business Media, 1999

  63. [63]

    Adaptive kernel design for Bayesian optimization is a piece of CAKE with LLMs

    Richard Cornelius Suwandi, Feng Yin, Juntao Wang, Renjie Li, Tsung-Hui Chang, and Sergios Theodoridis. Adaptive kernel design for Bayesian optimization is a piece of CAKE with LLMs. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  64. [64]

    Multi-task Bayesian optimization

    Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task Bayesian optimization. In Advances in Neural Information Processing Systems (NeurIPS), volume 26, 2013

  65. [65]

    Freeze-thaw bayesian optimization

    Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. Freeze-thaw bayesian optimization. arXiv preprint arXiv:1406.3896, 2014

  66. [66]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023. 13

  67. [67]

    Fröhlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, and Christian Daniel

    Michael V olpp, Lukas P. Fröhlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, and Christian Daniel. Meta-learning acquisition functions for transfer learning in Bayesian optimization. InInternational Conference on Learning Representations (ICLR), 2020

  68. [68]

    In-context learning is provably bayesian inference: a generalization theory for meta-learning.arXiv preprint arXiv:2510.10981, 2025

    Tomoya Wakayama and Taiji Suzuki. In-context learning is provably bayesian inference: a generalization theory for meta-learning.arXiv preprint arXiv:2510.10981, 2025

  69. [69]

    Max-value entropy search for efficient Bayesian optimization

    Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient Bayesian optimization. InInternational Conference on Machine Learning (ICML), pages 3627–3635. PMLR, 2017

  70. [70]

    Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior.Advances in Neural Information Processing Systems (NeurIPS), 31, 2018

    Zi Wang, Beomjoon Kim, and Leslie P Kaelbling. Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior.Advances in Neural Information Processing Systems (NeurIPS), 31, 2018

  71. [71]

    MIT press Cambridge, MA, 2006

    Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning. MIT press Cambridge, MA, 2006

  72. [72]

    How many pretraining tasks are needed for in-context learning of linear regression? InThe Twelfth International Conference on Learning Representations, 2024

    Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, and Peter Bartlett. How many pretraining tasks are needed for in-context learning of linear regression? InThe Twelfth International Conference on Learning Representations, 2024

  73. [73]

    Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.Physical review letters, 120(14):145301, 2018

    Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.Physical review letters, 120(14):145301, 2018

  74. [74]

    Jones, and Michael A

    Wenjie Xu, Masaki Adachi, Colin N. Jones, and Michael A. Osborne. Principled Bayesian optimization in collaboration with human experts. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 104091–104137, 2024

  75. [75]

    Verbalizing LLM’s higher-order uncertainty via imprecise probabilities.arXiv preprint arXiv:2603.10396, 2026

    Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, and Masaki Adachi. Verbalizing llm’s higher-order uncertainty via imprecise probabilities.arXiv preprint arXiv:2603.10396, 2026

  76. [76]

    Self-taught optimizer (STOP): Recursively self-improving code generation

    Eric Zelikman, Eliana Lorch, Lester Mackey, and Adam Tauman Kalai. Self-taught optimizer (STOP): Recursively self-improving code generation. InConference on Language Modeling (COLM), 2024

  77. [77]

    Darwin godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

    Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin Gödel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

  78. [78]

    OMNI: Open-endedness via models of human notions of interestingness

    Jenny Zhang, Joel Lehman, Kenneth Stanley, and Jeff Clune. OMNI: Open-endedness via models of human notions of interestingness. InThe Twelfth International Conference on Learning Representations, 2024

  79. [79]

    Juliusz Ziomek, Masaki Adachi, and Michael A Osborne. Bayesian optimisation with un- known hyperparameters: regret bounds logarithmically closer to optimal.Advances in Neural Information Processing Systems (NeurIPS), 37:86346–86374, 2024

  80. [80]

    resolution

    Juliusz Ziomek, Masaki Adachi, and Michael A Osborne. Time-varying Gaussian process bandits with unknown prior. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), volume 258, pages 4294–4302, 2025. 14 A Notations Notations are summarized in Table 2. B Preliminary B.1 Gaussian process regression background Gaussian process mode...

Showing first 80 references.