In-Context Learning for Latent Space Bayesian Optimization
Pith reviewed 2026-06-27 17:04 UTC · model grok-4.3
The pith
Continued pretraining of tabular foundation models on latent-space optimization tasks yields effective surrogates for molecular Bayesian optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE, and adding a regularizer that anchors the model to the original checkpoint, the adapted in-context learner achieves strong performance as a surrogate for latent-space Bayesian optimization on held-out molecular optimization benchmarks.
What carries the argument
The anchoring regularizer applied during continued pretraining on latent-space synthetic optimization tasks.
If this is right
- The adapted model functions as an effective surrogate inside latent-space Bayesian optimization loops for molecules.
- The regularizer preserves the model's original regression capabilities while incorporating the new optimization tasks.
- Performance gains on held-out benchmarks demonstrate that LSBO-specific adaptation is relevant for in-context surrogates.
- The method maintains sample efficiency advantages of Bayesian optimization while handling structured objects.
Where Pith is reading between the lines
- The same continued-pretraining recipe could be tested on protein or materials latent spaces to check transferability.
- Different choices of synthetic task distribution or regularizer strength might produce further gains on specific design objectives.
- If the mismatch is indeed central, similar adaptation steps may improve other in-context models applied to non-tabular optimization settings.
Load-bearing premise
The mismatch between LSBO's latent-to-objective map and standard regression pretraining is the main performance bottleneck, and the regularizer successfully prevents overspecialization while retaining the broad prior.
What would settle it
If the adapted model shows no improvement over unadapted tabular foundation models when used as surrogates inside LSBO on the same held-out molecular benchmarks, the central claim is falsified.
Figures
read the original abstract
Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes adapting tabular in-context learning models (e.g., TabPFN-style) for latent-space Bayesian optimization by continued pretraining on synthetic optimization tasks generated from the latent space of a molecular VAE. A regularizer anchors the model to its original checkpoint to preserve the broad regression prior while allowing LSBO-specific adaptation. The central empirical claim is that the resulting surrogate achieves strong performance on held-out molecular optimization benchmarks, demonstrating the value of addressing the LSBO-induced distribution mismatch.
Significance. If the empirical claims hold after verification, the work would provide a concrete mechanism for specializing in-context surrogates to structured design tasks without generic continued pretraining, with potential impact on sample-efficient molecular and protein optimization.
major comments (2)
- [continued-pretraining objective and experiments sections] The central claim that the regularizer enables 'LSBO-specific adaptation' while preserving the original regression prior is not directly supported: no results are reported measuring retained accuracy on held-out standard tabular regression tasks drawn from the base model's pretraining distribution after adaptation (see the continued-pretraining objective and experiments sections). Without this check, gains on molecular benchmarks could be explained by generic continued pretraining rather than targeted mismatch resolution.
- [Abstract and results] The abstract asserts that the model 'achieves strong performance' on held-out benchmarks, but the provided description supplies no quantitative results, baselines, error bars, or details on performance measurement; the full paper must include these to substantiate the empirical claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the empirical support for our claims. We address each major point below.
read point-by-point responses
-
Referee: [continued-pretraining objective and experiments sections] The central claim that the regularizer enables 'LSBO-specific adaptation' while preserving the original regression prior is not directly supported: no results are reported measuring retained accuracy on held-out standard tabular regression tasks drawn from the base model's pretraining distribution after adaptation (see the continued-pretraining objective and experiments sections). Without this check, gains on molecular benchmarks could be explained by generic continued pretraining rather than targeted mismatch resolution.
Authors: We agree that an explicit measurement of retained accuracy on standard tabular regression tasks would strengthen the evidence that the regularizer preserves the original prior rather than allowing generic continued pretraining to explain the gains. In the revised manuscript we have added results on held-out standard regression tasks sampled from the base pretraining distribution, showing that the regularized model retains accuracy comparable to the original checkpoint while the unregularized adaptation degrades performance. This directly supports the targeted nature of the LSBO adaptation. revision: yes
-
Referee: [Abstract and results] The abstract asserts that the model 'achieves strong performance' on held-out benchmarks, but the provided description supplies no quantitative results, baselines, error bars, or details on performance measurement; the full paper must include these to substantiate the empirical claim.
Authors: The full manuscript already contains the requested quantitative details in the experiments section, including performance metrics on the held-out molecular benchmarks, comparisons to baselines, and error bars over multiple runs. The abstract summarizes the finding at a high level without numbers, which follows standard conventions for brevity. No revision to the results presentation was required. revision: no
Circularity Check
Empirical adaptation with no reduction to inputs by construction
full rationale
The paper presents a continued-pretraining procedure on synthetic LSBO tasks (with an anchoring regularizer) followed by empirical evaluation on held-out molecular benchmarks. No derivation chain is claimed; performance is measured directly rather than derived from fitted parameters or self-citations. The abstract and described method contain no self-definitional equations, no 'prediction' that is the fit itself, and no load-bearing uniqueness theorems imported from prior author work. This is a standard empirical adaptation result.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In-Context Black-Box Optimization with Unreliable Feedback
Nicolas Samuel Blumer, Julien Martinelli, and Samuel Kaski. In-context black-box optimization with unreliable feedback. arXiv preprint arXiv:2605.06187, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Nathan Brown, Marco Fiscato, Marwin H. S. Segler, and Alain C. Vaucher. Guacamol: Benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling, 2019
2019
-
[3]
Jaewon Chu, Jinyoung Park, Seunghun Lee, and Hyunwoo J. Kim. Inversion-based latent bayesian optimization. In Advances in Neural Information Processing Systems, 2024
2024
-
[4]
Turner, and Matthias Poloczek
David Eriksson, Michael Pearce, Jacob Gardner, Ryan D. Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. In Advances in Neural Information Processing Systems, 2019
2019
-
[5]
Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data
Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel M \"u ller, and Frank Hutter. Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data. In 1st ICML Workshop on Foundation Models for Structured Data, 2025. URL https://openreview.net/forum?id=BtEiqKsIMw
2025
-
[6]
Bayesian optimization
Roman Garnett. Bayesian optimization. Cambridge University Press, 2023
2023
-
[7]
Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D
Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4 0 (2): 0 268--276, 2018
2018
-
[8]
L \'e o Grinsztajn, Klemens Fl \"o ge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin, Magnus B \"u hler, Anurag Garg, et al. Tabpfn-3: Technical report. arXiv preprint arXiv:2605.13986, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[9]
Nate Gruver, Samuel Stanton, Nathan Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew Gordon Wilson. Protein design with guided discrete diffusion. In Advances in Neural Information Processing Systems, 2024
2024
-
[10]
Accurate predictions on small data with a tabular foundation model
Noah Hollmann, Samuel Muller, Lennart Purucker, Arjun Krishnakumar, Max Korfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025
2025
-
[11]
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114 0 (13): 0 3521--3526, 2017
2017
-
[12]
Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, and Hyunwoo J. Kim. Advancing bayesian optimization via learning correlated latent space. In Advances in Neural Information Processing Systems, 2023
2023
-
[13]
Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J. Kim. Latent bayesian optimization via autoregressive normalizing flows. In International Conference on Learning Representations, 2025
2025
-
[14]
None to optima in few shots: Bayesian optimization with mdp priors
Diantong Li, Kyunghyun Cho, and Chong Liu. None to optima in few shots: Bayesian optimization with mdp priors. arXiv preprint arXiv:2511.01006, 2025
-
[15]
SGDR: Stochastic Gradient Descent with Warm Restarts
Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[17]
End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes
Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes . Advances in Neural Information Processing Systems , 2023
2023
-
[18]
Kusner, John Bradshaw, and Jacob Gardner
Natalie Maus, Haydn Jones, Juston Moore, Matt J. Kusner, John Bradshaw, and Jacob Gardner. Local latent space bayesian optimization over structured inputs. In Advances in Neural Information Processing Systems, 2022
2022
-
[19]
Natalie Maus, Yimeng Zeng, Haydn Thomas Jones, Yining Huang, Gaurav Ng Goel, Alden Rose, Kyurae Kim, Hyun-Su Lee, Marcelo Der Torossian Torres, Fangping Wan, Cesar de la Fuente-Nunez, Mark Yatskar, Osbert Bastani, and Jacob R. Gardner. Purely agentic black-box optimization for biological design. arXiv preprint arXiv:2601.22382, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
Moss, Sebastian W
Henry B. Moss, Sebastian W. Ober, and Tom Diethe. Return of the latent space cowboys: Re-thinking the use of vaes for bayesian optimisation of structured spaces. In International Conference on Machine Learning, 2025
2025
-
[21]
Transformers can do bayesian inference
Samuel Muller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. arXiv preprint arXiv:2112.10510, 2021
-
[22]
PFN s4 BO : In-context learning for B ayesian optimization
Samuel Muller, Matthias Feurer, Noah Hollmann, and Frank Hutter. PFN s4 BO : In-context learning for B ayesian optimization. In International Conference on Machine Learning, 2023
2023
-
[23]
Position: The future of bayesian prediction is prior-fitted
Samuel Muller, Arik Reuter, Noah Hollmann, David R \"u gamer, and Frank Hutter. Position: The future of bayesian prediction is prior-fitted. arXiv preprint arXiv:2505.23947, 2025
-
[24]
Jingang Qu, David Holzm \"u ller, Ga \"e l Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model. arXiv preprint arXiv:2602.11139, 2026
-
[25]
Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006
2006
-
[26]
Accelerating bayesian optimization for biological sequence design with denoising autoencoders
Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, and Andrew Gordon Wilson. Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, 2022
2022
-
[27]
Sample-efficient optimization in the latent space of deep generative models via weighted retraining
Austin Tripp, Erik Daxberger, and Jose Miguel Hernandez-Lobato. Sample-efficient optimization in the latent space of deep generative models via weighted retraining. In Advances in Neural Information Processing Systems, 2020
2020
-
[28]
-pfn: In-context learning entropy search
Tom Viering, Steven Adriaensen, Herilalaina Rakotoarison, Samuel Muller, Carl Hvarfner, Frank Hutter, and Eytan Bakshy. -pfn: In-context learning entropy search. In ICLR Workshop on Frontiers in Probabilistic Inference, 2025
2025
- [29]
-
[30]
Explicit inductive bias for transfer learning with convolutional networks
Li Xuhong, Yves Grandvalet, and Franck Davoine. Explicit inductive bias for transfer learning with convolutional networks. In International conference on machine learning, pages 2825--2834. PMLR, 2018
2018
-
[31]
GIT - BO : High-dimensional bayesian optimization with tabular foundation models
Rosen Ting-Ying Yu, Cyril Picard, and Faez Ahmed. GIT - BO : High-dimensional bayesian optimization with tabular foundation models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=9iTdKS4SRQ
2026
-
[32]
Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a
Qiong Zhang, Yan Shuo Tan, Qinglong Tian, and Pengfei Li. Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a
-
[33]
PABBO: Preferential Amortized Black-Box Optimization
Xinyu Zhang, Daolang Huang, Samuel Kaski, and Julien Martinelli. PABBO: Preferential Amortized Black-Box Optimization . International Conference on Learning Representations , 2025 b
2025
-
[34]
In-context multi-objective optimization
Xinyu Zhang, Conor Hassan, Julien Martinelli, Daolang Huang, and Samuel Kaski. In-context multi-objective optimization. In International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=odmeUlWta8
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.