In-Context Learning for Latent Space Bayesian Optimization

Harri L\"ahdesm\"aki; Julien Martinelli; Tuan A. Vu

arxiv: 2606.09664 · v1 · pith:ZXLH3UOCnew · submitted 2026-06-08 · 💻 cs.LG · stat.ML

In-Context Learning for Latent Space Bayesian Optimization

Tuan A. Vu , Harri L\"ahdesm\"aki , Julien Martinelli This is my paper

Pith reviewed 2026-06-27 17:04 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords in-context learninglatent space Bayesian optimizationmolecular optimizationtabular foundation modelssurrogate modelscontinued pretrainingVAE latent space

0 comments

The pith

Continued pretraining of tabular foundation models on latent-space optimization tasks yields effective surrogates for molecular Bayesian optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies a mismatch between the latent-to-objective maps created by latent-space Bayesian optimization and the regression distributions used to pretrain current in-context models. It corrects the mismatch by supplementing pretraining with synthetic optimization tasks drawn from the latent space of a molecular VAE, while a regularizer keeps the model anchored to its original checkpoint. The resulting model is then evaluated as a surrogate inside LSBO loops on held-out molecular benchmarks. If the approach succeeds, in-context learners become usable for structured design tasks without sacrificing their general regression behavior.

Core claim

By complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE, and adding a regularizer that anchors the model to the original checkpoint, the adapted in-context learner achieves strong performance as a surrogate for latent-space Bayesian optimization on held-out molecular optimization benchmarks.

What carries the argument

The anchoring regularizer applied during continued pretraining on latent-space synthetic optimization tasks.

If this is right

The adapted model functions as an effective surrogate inside latent-space Bayesian optimization loops for molecules.
The regularizer preserves the model's original regression capabilities while incorporating the new optimization tasks.
Performance gains on held-out benchmarks demonstrate that LSBO-specific adaptation is relevant for in-context surrogates.
The method maintains sample efficiency advantages of Bayesian optimization while handling structured objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same continued-pretraining recipe could be tested on protein or materials latent spaces to check transferability.
Different choices of synthetic task distribution or regularizer strength might produce further gains on specific design objectives.
If the mismatch is indeed central, similar adaptation steps may improve other in-context models applied to non-tabular optimization settings.

Load-bearing premise

The mismatch between LSBO's latent-to-objective map and standard regression pretraining is the main performance bottleneck, and the regularizer successfully prevents overspecialization while retaining the broad prior.

What would settle it

If the adapted model shows no improvement over unadapted tabular foundation models when used as surrogates inside LSBO on the same held-out molecular benchmarks, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2606.09664 by Harri L\"ahdesm\"aki, Julien Martinelli, Tuan A. Vu.

**Figure 1.** Figure 1: Latent-space coverage of pretraining episodes. UMAP embeddings for 3 synthetic objectives and 2 held-out benchmarks. 10 0 10 20 30 10 0 10 20 30 dhop shop albs mess celr thir tror iso1 iso2 satd jnk3 gsk3 drd2 qed rano med2 osmb fexo adip zale synthetic base heldout [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Objective-space coverage of the synthetic prior. Each objective is evaluated on the same probe set P and represented by the resulting value vector (Equation 12), embedded with UMAP. function of oracle evaluations, and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Main results. Best value found over oracle calls and distribution over the final recommendation on held-out molecular objectives. Mean ± 1 std computed across 10 seeds. dation models are already competitive surrogates for LSBO when inserted into an otherwise fixed optimization pipeline, and that LSBO-aware continued pretraining yields additional gains. While LILBO does not uniformly dominate every task or… view at source ↗

**Figure 4.** Figure 4: Base objective distributions. Kernel density estimates of the normalized base objectives used to construct the synthetic prior. A.2. Context and query sampling We evaluate the synthetic objective on the GuacaMol molecular pool to obtain yn = f [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Synthetic objective distributions. Kernel density estimates of representative normalized synthetic objectives sampled from the pretraining prior. B. Implementation details B.1. Continued-pretraining configuration Following (Garg et al., 2025), we adopt a two-stage approach. In the first stage, we start from the original TabPFN-3 (Grinsztajn et al., 2026) checkpoint, which has already been pretrained on a b… view at source ↗

read the original abstract

Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The LSBO adaptation via continued pretraining and anchoring regularizer is a targeted fix for a real mismatch, but the paper needs to show that the regularizer actually preserves the base model's capabilities.

read the letter

The paper takes tabular in-context models pretrained on synthetic regression and adds a continued-pretraining stage on synthetic optimization tasks drawn from a molecular VAE latent space, plus a regularizer that anchors weights back to the original checkpoint.

This targets a concrete mismatch: standard pretraining distributions do not produce the same latent-to-objective maps that appear in LSBO for molecules. The method is new in its specific combination of LSBO-style tasks with the anchoring term for these foundation-model surrogates.

The approach is reasonable on its face. It tries to specialize without full retraining and without destroying the in-context behavior that makes these models attractive as BO surrogates.

The main gap is exactly the one flagged in the stress-test note. The abstract describes the regularizer but gives no measurement of retained accuracy on the original pretraining-style regression tasks after adaptation. Without that check, gains on molecular benchmarks could simply reflect generic continued training rather than successful resolution of the LSBO mismatch. The claim of "strong performance" on held-out benchmarks also appears without numbers, baselines, or error bars, so the size of any improvement is impossible to judge from the provided text.

The setup itself looks internally consistent and the citations track the relevant prior work on TabPFN-style models and LSBO.

This is for researchers working on surrogate models for structured optimization, especially molecules or proteins. A reader already following in-context learning for BO would get a clear description of one adaptation strategy.

It deserves peer review so the experiments can be examined in full and the regularizer's effect can be tested directly.

Referee Report

2 major / 0 minor

Summary. The paper proposes adapting tabular in-context learning models (e.g., TabPFN-style) for latent-space Bayesian optimization by continued pretraining on synthetic optimization tasks generated from the latent space of a molecular VAE. A regularizer anchors the model to its original checkpoint to preserve the broad regression prior while allowing LSBO-specific adaptation. The central empirical claim is that the resulting surrogate achieves strong performance on held-out molecular optimization benchmarks, demonstrating the value of addressing the LSBO-induced distribution mismatch.

Significance. If the empirical claims hold after verification, the work would provide a concrete mechanism for specializing in-context surrogates to structured design tasks without generic continued pretraining, with potential impact on sample-efficient molecular and protein optimization.

major comments (2)

[continued-pretraining objective and experiments sections] The central claim that the regularizer enables 'LSBO-specific adaptation' while preserving the original regression prior is not directly supported: no results are reported measuring retained accuracy on held-out standard tabular regression tasks drawn from the base model's pretraining distribution after adaptation (see the continued-pretraining objective and experiments sections). Without this check, gains on molecular benchmarks could be explained by generic continued pretraining rather than targeted mismatch resolution.
[Abstract and results] The abstract asserts that the model 'achieves strong performance' on held-out benchmarks, but the provided description supplies no quantitative results, baselines, error bars, or details on performance measurement; the full paper must include these to substantiate the empirical claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the empirical support for our claims. We address each major point below.

read point-by-point responses

Referee: [continued-pretraining objective and experiments sections] The central claim that the regularizer enables 'LSBO-specific adaptation' while preserving the original regression prior is not directly supported: no results are reported measuring retained accuracy on held-out standard tabular regression tasks drawn from the base model's pretraining distribution after adaptation (see the continued-pretraining objective and experiments sections). Without this check, gains on molecular benchmarks could be explained by generic continued pretraining rather than targeted mismatch resolution.

Authors: We agree that an explicit measurement of retained accuracy on standard tabular regression tasks would strengthen the evidence that the regularizer preserves the original prior rather than allowing generic continued pretraining to explain the gains. In the revised manuscript we have added results on held-out standard regression tasks sampled from the base pretraining distribution, showing that the regularized model retains accuracy comparable to the original checkpoint while the unregularized adaptation degrades performance. This directly supports the targeted nature of the LSBO adaptation. revision: yes
Referee: [Abstract and results] The abstract asserts that the model 'achieves strong performance' on held-out benchmarks, but the provided description supplies no quantitative results, baselines, error bars, or details on performance measurement; the full paper must include these to substantiate the empirical claim.

Authors: The full manuscript already contains the requested quantitative details in the experiments section, including performance metrics on the held-out molecular benchmarks, comparisons to baselines, and error bars over multiple runs. The abstract summarizes the finding at a high level without numbers, which follows standard conventions for brevity. No revision to the results presentation was required. revision: no

Circularity Check

0 steps flagged

Empirical adaptation with no reduction to inputs by construction

full rationale

The paper presents a continued-pretraining procedure on synthetic LSBO tasks (with an anchoring regularizer) followed by empirical evaluation on held-out molecular benchmarks. No derivation chain is claimed; performance is measured directly rather than derived from fitted parameters or self-citations. The abstract and described method contain no self-definitional equations, no 'prediction' that is the fit itself, and no load-bearing uniqueness theorems imported from prior author work. This is a standard empirical adaptation result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities used in the method.

pith-pipeline@v0.9.1-grok · 5722 in / 1063 out tokens · 20219 ms · 2026-06-27T17:04:15.391801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 11 canonical work pages · 5 internal anchors

[1]

In-Context Black-Box Optimization with Unreliable Feedback

Nicolas Samuel Blumer, Julien Martinelli, and Samuel Kaski. In-context black-box optimization with unreliable feedback. arXiv preprint arXiv:2605.06187, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Nathan Brown, Marco Fiscato, Marwin H. S. Segler, and Alain C. Vaucher. Guacamol: Benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling, 2019

2019
[3]

Jaewon Chu, Jinyoung Park, Seunghun Lee, and Hyunwoo J. Kim. Inversion-based latent bayesian optimization. In Advances in Neural Information Processing Systems, 2024

2024
[4]

Turner, and Matthias Poloczek

David Eriksson, Michael Pearce, Jacob Gardner, Ryan D. Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. In Advances in Neural Information Processing Systems, 2019

2019
[5]

Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data

Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel M \"u ller, and Frank Hutter. Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data. In 1st ICML Workshop on Foundation Models for Structured Data, 2025. URL https://openreview.net/forum?id=BtEiqKsIMw

2025
[6]

Bayesian optimization

Roman Garnett. Bayesian optimization. Cambridge University Press, 2023

2023
[7]

Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D

Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4 0 (2): 0 268--276, 2018

2018
[8]

TabPFN-3: Technical Report

L \'e o Grinsztajn, Klemens Fl \"o ge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin, Magnus B \"u hler, Anurag Garg, et al. Tabpfn-3: Technical report. arXiv preprint arXiv:2605.13986, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

Nate Gruver, Samuel Stanton, Nathan Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew Gordon Wilson. Protein design with guided discrete diffusion. In Advances in Neural Information Processing Systems, 2024

2024
[10]

Accurate predictions on small data with a tabular foundation model

Noah Hollmann, Samuel Muller, Lennart Purucker, Arjun Krishnakumar, Max Korfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025

2025
[11]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114 0 (13): 0 3521--3526, 2017

2017
[12]

Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, and Hyunwoo J. Kim. Advancing bayesian optimization via learning correlated latent space. In Advances in Neural Information Processing Systems, 2023

2023
[13]

Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J. Kim. Latent bayesian optimization via autoregressive normalizing flows. In International Conference on Learning Representations, 2025

2025
[14]

None to optima in few shots: Bayesian optimization with mdp priors

Diantong Li, Kyunghyun Cho, and Chong Liu. None to optima in few shots: Bayesian optimization with mdp priors. arXiv preprint arXiv:2511.01006, 2025

work page arXiv 2025
[15]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[16]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes

Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes . Advances in Neural Information Processing Systems , 2023

2023
[18]

Kusner, John Bradshaw, and Jacob Gardner

Natalie Maus, Haydn Jones, Juston Moore, Matt J. Kusner, John Bradshaw, and Jacob Gardner. Local latent space bayesian optimization over structured inputs. In Advances in Neural Information Processing Systems, 2022

2022
[19]

Natalie Maus, Yimeng Zeng, Haydn Thomas Jones, Yining Huang, Gaurav Ng Goel, Alden Rose, Kyurae Kim, Hyun-Su Lee, Marcelo Der Torossian Torres, Fangping Wan, Cesar de la Fuente-Nunez, Mark Yatskar, Osbert Bastani, and Jacob R. Gardner. Purely agentic black-box optimization for biological design. arXiv preprint arXiv:2601.22382, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Moss, Sebastian W

Henry B. Moss, Sebastian W. Ober, and Tom Diethe. Return of the latent space cowboys: Re-thinking the use of vaes for bayesian optimisation of structured spaces. In International Conference on Machine Learning, 2025

2025
[21]

Transformers can do bayesian inference

Samuel Muller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. arXiv preprint arXiv:2112.10510, 2021

work page arXiv 2021
[22]

PFN s4 BO : In-context learning for B ayesian optimization

Samuel Muller, Matthias Feurer, Noah Hollmann, and Frank Hutter. PFN s4 BO : In-context learning for B ayesian optimization. In International Conference on Machine Learning, 2023

2023
[23]

Position: The future of bayesian prediction is prior-fitted

Samuel Muller, Arik Reuter, Noah Hollmann, David R \"u gamer, and Frank Hutter. Position: The future of bayesian prediction is prior-fitted. arXiv preprint arXiv:2505.23947, 2025

work page arXiv 2025
[24]

Tabiclv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139, 2026

Jingang Qu, David Holzm \"u ller, Ga \"e l Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model. arXiv preprint arXiv:2602.11139, 2026

work page arXiv 2026
[25]

Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006

2006
[26]

Accelerating bayesian optimization for biological sequence design with denoising autoencoders

Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, and Andrew Gordon Wilson. Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, 2022

2022
[27]

Sample-efficient optimization in the latent space of deep generative models via weighted retraining

Austin Tripp, Erik Daxberger, and Jose Miguel Hernandez-Lobato. Sample-efficient optimization in the latent space of deep generative models via weighted retraining. In Advances in Neural Information Processing Systems, 2020

2020
[28]

-pfn: In-context learning entropy search

Tom Viering, Steven Adriaensen, Herilalaina Rakotoarison, Samuel Muller, Carl Hvarfner, Frank Hutter, and Eytan Bakshy. -pfn: In-context learning entropy search. In ICLR Workshop on Frontiers in Probabilistic Inference, 2025

2025
[29]

a hdesm \

Tuan A. Vu, Julien Martinelli, and Harri L \"a hdesm \"a ki. Time-aware latent space bayesian optimization. arXiv preprint arXiv:2603.00935, 2026

work page arXiv 2026
[30]

Explicit inductive bias for transfer learning with convolutional networks

Li Xuhong, Yves Grandvalet, and Franck Davoine. Explicit inductive bias for transfer learning with convolutional networks. In International conference on machine learning, pages 2825--2834. PMLR, 2018

2018
[31]

GIT - BO : High-dimensional bayesian optimization with tabular foundation models

Rosen Ting-Ying Yu, Cyril Picard, and Faez Ahmed. GIT - BO : High-dimensional bayesian optimization with tabular foundation models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=9iTdKS4SRQ

2026
[32]

Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a

Qiong Zhang, Yan Shuo Tan, Qinglong Tian, and Pengfei Li. Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a

work page arXiv 2025
[33]

PABBO: Preferential Amortized Black-Box Optimization

Xinyu Zhang, Daolang Huang, Samuel Kaski, and Julien Martinelli. PABBO: Preferential Amortized Black-Box Optimization . International Conference on Learning Representations , 2025 b

2025
[34]

In-context multi-objective optimization

Xinyu Zhang, Conor Hassan, Julien Martinelli, Daolang Huang, and Samuel Kaski. In-context multi-objective optimization. In International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=odmeUlWta8

2026

[1] [1]

In-Context Black-Box Optimization with Unreliable Feedback

Nicolas Samuel Blumer, Julien Martinelli, and Samuel Kaski. In-context black-box optimization with unreliable feedback. arXiv preprint arXiv:2605.06187, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Nathan Brown, Marco Fiscato, Marwin H. S. Segler, and Alain C. Vaucher. Guacamol: Benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling, 2019

2019

[3] [3]

Jaewon Chu, Jinyoung Park, Seunghun Lee, and Hyunwoo J. Kim. Inversion-based latent bayesian optimization. In Advances in Neural Information Processing Systems, 2024

2024

[4] [4]

Turner, and Matthias Poloczek

David Eriksson, Michael Pearce, Jacob Gardner, Ryan D. Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. In Advances in Neural Information Processing Systems, 2019

2019

[5] [5]

Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data

Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel M \"u ller, and Frank Hutter. Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data. In 1st ICML Workshop on Foundation Models for Structured Data, 2025. URL https://openreview.net/forum?id=BtEiqKsIMw

2025

[6] [6]

Bayesian optimization

Roman Garnett. Bayesian optimization. Cambridge University Press, 2023

2023

[7] [7]

Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D

Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4 0 (2): 0 268--276, 2018

2018

[8] [8]

TabPFN-3: Technical Report

L \'e o Grinsztajn, Klemens Fl \"o ge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin, Magnus B \"u hler, Anurag Garg, et al. Tabpfn-3: Technical report. arXiv preprint arXiv:2605.13986, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[9] [9]

Nate Gruver, Samuel Stanton, Nathan Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew Gordon Wilson. Protein design with guided discrete diffusion. In Advances in Neural Information Processing Systems, 2024

2024

[10] [10]

Accurate predictions on small data with a tabular foundation model

Noah Hollmann, Samuel Muller, Lennart Purucker, Arjun Krishnakumar, Max Korfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025

2025

[11] [11]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114 0 (13): 0 3521--3526, 2017

2017

[12] [12]

Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, and Hyunwoo J. Kim. Advancing bayesian optimization via learning correlated latent space. In Advances in Neural Information Processing Systems, 2023

2023

[13] [13]

Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J. Kim. Latent bayesian optimization via autoregressive normalizing flows. In International Conference on Learning Representations, 2025

2025

[14] [14]

None to optima in few shots: Bayesian optimization with mdp priors

Diantong Li, Kyunghyun Cho, and Chong Liu. None to optima in few shots: Bayesian optimization with mdp priors. arXiv preprint arXiv:2511.01006, 2025

work page arXiv 2025

[15] [15]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[16] [16]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes

Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes . Advances in Neural Information Processing Systems , 2023

2023

[18] [18]

Kusner, John Bradshaw, and Jacob Gardner

Natalie Maus, Haydn Jones, Juston Moore, Matt J. Kusner, John Bradshaw, and Jacob Gardner. Local latent space bayesian optimization over structured inputs. In Advances in Neural Information Processing Systems, 2022

2022

[19] [19]

Natalie Maus, Yimeng Zeng, Haydn Thomas Jones, Yining Huang, Gaurav Ng Goel, Alden Rose, Kyurae Kim, Hyun-Su Lee, Marcelo Der Torossian Torres, Fangping Wan, Cesar de la Fuente-Nunez, Mark Yatskar, Osbert Bastani, and Jacob R. Gardner. Purely agentic black-box optimization for biological design. arXiv preprint arXiv:2601.22382, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[20] [20]

Moss, Sebastian W

Henry B. Moss, Sebastian W. Ober, and Tom Diethe. Return of the latent space cowboys: Re-thinking the use of vaes for bayesian optimisation of structured spaces. In International Conference on Machine Learning, 2025

2025

[21] [21]

Transformers can do bayesian inference

Samuel Muller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. arXiv preprint arXiv:2112.10510, 2021

work page arXiv 2021

[22] [22]

PFN s4 BO : In-context learning for B ayesian optimization

Samuel Muller, Matthias Feurer, Noah Hollmann, and Frank Hutter. PFN s4 BO : In-context learning for B ayesian optimization. In International Conference on Machine Learning, 2023

2023

[23] [23]

Position: The future of bayesian prediction is prior-fitted

Samuel Muller, Arik Reuter, Noah Hollmann, David R \"u gamer, and Frank Hutter. Position: The future of bayesian prediction is prior-fitted. arXiv preprint arXiv:2505.23947, 2025

work page arXiv 2025

[24] [24]

Tabiclv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139, 2026

Jingang Qu, David Holzm \"u ller, Ga \"e l Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model. arXiv preprint arXiv:2602.11139, 2026

work page arXiv 2026

[25] [25]

Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006

2006

[26] [26]

Accelerating bayesian optimization for biological sequence design with denoising autoencoders

Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, and Andrew Gordon Wilson. Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, 2022

2022

[27] [27]

Sample-efficient optimization in the latent space of deep generative models via weighted retraining

Austin Tripp, Erik Daxberger, and Jose Miguel Hernandez-Lobato. Sample-efficient optimization in the latent space of deep generative models via weighted retraining. In Advances in Neural Information Processing Systems, 2020

2020

[28] [28]

-pfn: In-context learning entropy search

Tom Viering, Steven Adriaensen, Herilalaina Rakotoarison, Samuel Muller, Carl Hvarfner, Frank Hutter, and Eytan Bakshy. -pfn: In-context learning entropy search. In ICLR Workshop on Frontiers in Probabilistic Inference, 2025

2025

[29] [29]

a hdesm \

Tuan A. Vu, Julien Martinelli, and Harri L \"a hdesm \"a ki. Time-aware latent space bayesian optimization. arXiv preprint arXiv:2603.00935, 2026

work page arXiv 2026

[30] [30]

Explicit inductive bias for transfer learning with convolutional networks

Li Xuhong, Yves Grandvalet, and Franck Davoine. Explicit inductive bias for transfer learning with convolutional networks. In International conference on machine learning, pages 2825--2834. PMLR, 2018

2018

[31] [31]

GIT - BO : High-dimensional bayesian optimization with tabular foundation models

Rosen Ting-Ying Yu, Cyril Picard, and Faez Ahmed. GIT - BO : High-dimensional bayesian optimization with tabular foundation models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=9iTdKS4SRQ

2026

[32] [32]

Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a

Qiong Zhang, Yan Shuo Tan, Qinglong Tian, and Pengfei Li. Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a

work page arXiv 2025

[33] [33]

PABBO: Preferential Amortized Black-Box Optimization

Xinyu Zhang, Daolang Huang, Samuel Kaski, and Julien Martinelli. PABBO: Preferential Amortized Black-Box Optimization . International Conference on Learning Representations , 2025 b

2025

[34] [34]

In-context multi-objective optimization

Xinyu Zhang, Conor Hassan, Julien Martinelli, Daolang Huang, and Samuel Kaski. In-context multi-objective optimization. In International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=odmeUlWta8

2026