pith. sign in

arxiv: 2606.09664 · v1 · pith:ZXLH3UOCnew · submitted 2026-06-08 · 💻 cs.LG · stat.ML

In-Context Learning for Latent Space Bayesian Optimization

Pith reviewed 2026-06-27 17:04 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords in-context learninglatent space Bayesian optimizationmolecular optimizationtabular foundation modelssurrogate modelscontinued pretrainingVAE latent space
0
0 comments X

The pith

Continued pretraining of tabular foundation models on latent-space optimization tasks yields effective surrogates for molecular Bayesian optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies a mismatch between the latent-to-objective maps created by latent-space Bayesian optimization and the regression distributions used to pretrain current in-context models. It corrects the mismatch by supplementing pretraining with synthetic optimization tasks drawn from the latent space of a molecular VAE, while a regularizer keeps the model anchored to its original checkpoint. The resulting model is then evaluated as a surrogate inside LSBO loops on held-out molecular benchmarks. If the approach succeeds, in-context learners become usable for structured design tasks without sacrificing their general regression behavior.

Core claim

By complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE, and adding a regularizer that anchors the model to the original checkpoint, the adapted in-context learner achieves strong performance as a surrogate for latent-space Bayesian optimization on held-out molecular optimization benchmarks.

What carries the argument

The anchoring regularizer applied during continued pretraining on latent-space synthetic optimization tasks.

If this is right

  • The adapted model functions as an effective surrogate inside latent-space Bayesian optimization loops for molecules.
  • The regularizer preserves the model's original regression capabilities while incorporating the new optimization tasks.
  • Performance gains on held-out benchmarks demonstrate that LSBO-specific adaptation is relevant for in-context surrogates.
  • The method maintains sample efficiency advantages of Bayesian optimization while handling structured objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same continued-pretraining recipe could be tested on protein or materials latent spaces to check transferability.
  • Different choices of synthetic task distribution or regularizer strength might produce further gains on specific design objectives.
  • If the mismatch is indeed central, similar adaptation steps may improve other in-context models applied to non-tabular optimization settings.

Load-bearing premise

The mismatch between LSBO's latent-to-objective map and standard regression pretraining is the main performance bottleneck, and the regularizer successfully prevents overspecialization while retaining the broad prior.

What would settle it

If the adapted model shows no improvement over unadapted tabular foundation models when used as surrogates inside LSBO on the same held-out molecular benchmarks, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2606.09664 by Harri L\"ahdesm\"aki, Julien Martinelli, Tuan A. Vu.

Figure 1
Figure 1. Figure 1: Latent-space coverage of pretraining episodes. UMAP embeddings for 3 synthetic objectives and 2 held-out benchmarks. 10 0 10 20 30 10 0 10 20 30 dhop shop albs mess celr thir tror iso1 iso2 satd jnk3 gsk3 drd2 qed rano med2 osmb fexo adip zale synthetic base heldout [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Objective-space coverage of the synthetic prior. Each objective is evaluated on the same probe set P and represented by the resulting value vector (Equation 12), embedded with UMAP. function of oracle evaluations, and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Main results. Best value found over oracle calls and distribution over the final recommendation on held-out molecular objectives. Mean ± 1 std computed across 10 seeds. dation models are already competitive surrogates for LSBO when inserted into an otherwise fixed optimization pipeline, and that LSBO-aware continued pretraining yields addi￾tional gains. While LILBO does not uniformly dominate every task or… view at source ↗
Figure 4
Figure 4. Figure 4: Base objective distributions. Kernel density estimates of the normalized base objectives used to construct the synthetic prior. A.2. Context and query sampling We evaluate the synthetic objective on the GuacaMol molecular pool to obtain yn = f [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Synthetic objective distributions. Kernel density estimates of representative normalized synthetic objectives sampled from the pretraining prior. B. Implementation details B.1. Continued-pretraining configuration Following (Garg et al., 2025), we adopt a two-stage approach. In the first stage, we start from the original TabPFN-3 (Grinsztajn et al., 2026) checkpoint, which has already been pretrained on a b… view at source ↗
read the original abstract

Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes adapting tabular in-context learning models (e.g., TabPFN-style) for latent-space Bayesian optimization by continued pretraining on synthetic optimization tasks generated from the latent space of a molecular VAE. A regularizer anchors the model to its original checkpoint to preserve the broad regression prior while allowing LSBO-specific adaptation. The central empirical claim is that the resulting surrogate achieves strong performance on held-out molecular optimization benchmarks, demonstrating the value of addressing the LSBO-induced distribution mismatch.

Significance. If the empirical claims hold after verification, the work would provide a concrete mechanism for specializing in-context surrogates to structured design tasks without generic continued pretraining, with potential impact on sample-efficient molecular and protein optimization.

major comments (2)
  1. [continued-pretraining objective and experiments sections] The central claim that the regularizer enables 'LSBO-specific adaptation' while preserving the original regression prior is not directly supported: no results are reported measuring retained accuracy on held-out standard tabular regression tasks drawn from the base model's pretraining distribution after adaptation (see the continued-pretraining objective and experiments sections). Without this check, gains on molecular benchmarks could be explained by generic continued pretraining rather than targeted mismatch resolution.
  2. [Abstract and results] The abstract asserts that the model 'achieves strong performance' on held-out benchmarks, but the provided description supplies no quantitative results, baselines, error bars, or details on performance measurement; the full paper must include these to substantiate the empirical claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the empirical support for our claims. We address each major point below.

read point-by-point responses
  1. Referee: [continued-pretraining objective and experiments sections] The central claim that the regularizer enables 'LSBO-specific adaptation' while preserving the original regression prior is not directly supported: no results are reported measuring retained accuracy on held-out standard tabular regression tasks drawn from the base model's pretraining distribution after adaptation (see the continued-pretraining objective and experiments sections). Without this check, gains on molecular benchmarks could be explained by generic continued pretraining rather than targeted mismatch resolution.

    Authors: We agree that an explicit measurement of retained accuracy on standard tabular regression tasks would strengthen the evidence that the regularizer preserves the original prior rather than allowing generic continued pretraining to explain the gains. In the revised manuscript we have added results on held-out standard regression tasks sampled from the base pretraining distribution, showing that the regularized model retains accuracy comparable to the original checkpoint while the unregularized adaptation degrades performance. This directly supports the targeted nature of the LSBO adaptation. revision: yes

  2. Referee: [Abstract and results] The abstract asserts that the model 'achieves strong performance' on held-out benchmarks, but the provided description supplies no quantitative results, baselines, error bars, or details on performance measurement; the full paper must include these to substantiate the empirical claim.

    Authors: The full manuscript already contains the requested quantitative details in the experiments section, including performance metrics on the held-out molecular benchmarks, comparisons to baselines, and error bars over multiple runs. The abstract summarizes the finding at a high level without numbers, which follows standard conventions for brevity. No revision to the results presentation was required. revision: no

Circularity Check

0 steps flagged

Empirical adaptation with no reduction to inputs by construction

full rationale

The paper presents a continued-pretraining procedure on synthetic LSBO tasks (with an anchoring regularizer) followed by empirical evaluation on held-out molecular benchmarks. No derivation chain is claimed; performance is measured directly rather than derived from fitted parameters or self-citations. The abstract and described method contain no self-definitional equations, no 'prediction' that is the fit itself, and no load-bearing uniqueness theorems imported from prior author work. This is a standard empirical adaptation result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities used in the method.

pith-pipeline@v0.9.1-grok · 5722 in / 1063 out tokens · 20219 ms · 2026-06-27T17:04:15.391801+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    In-Context Black-Box Optimization with Unreliable Feedback

    Nicolas Samuel Blumer, Julien Martinelli, and Samuel Kaski. In-context black-box optimization with unreliable feedback. arXiv preprint arXiv:2605.06187, 2026

  2. [2]

    Nathan Brown, Marco Fiscato, Marwin H. S. Segler, and Alain C. Vaucher. Guacamol: Benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling, 2019

  3. [3]

    Jaewon Chu, Jinyoung Park, Seunghun Lee, and Hyunwoo J. Kim. Inversion-based latent bayesian optimization. In Advances in Neural Information Processing Systems, 2024

  4. [4]

    Turner, and Matthias Poloczek

    David Eriksson, Michael Pearce, Jacob Gardner, Ryan D. Turner, and Matthias Poloczek. Scalable global optimization via local bayesian optimization. In Advances in Neural Information Processing Systems, 2019

  5. [5]

    Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data

    Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel M \"u ller, and Frank Hutter. Real-tab PFN : Improving tabular foundation models via continued pre-training with real-world data. In 1st ICML Workshop on Foundation Models for Structured Data, 2025. URL https://openreview.net/forum?id=BtEiqKsIMw

  6. [6]

    Bayesian optimization

    Roman Garnett. Bayesian optimization. Cambridge University Press, 2023

  7. [7]

    Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D

    Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4 0 (2): 0 268--276, 2018

  8. [8]

    TabPFN-3: Technical Report

    L \'e o Grinsztajn, Klemens Fl \"o ge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin, Magnus B \"u hler, Anurag Garg, et al. Tabpfn-3: Technical report. arXiv preprint arXiv:2605.13986, 2026

  9. [9]

    Nate Gruver, Samuel Stanton, Nathan Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew Gordon Wilson. Protein design with guided discrete diffusion. In Advances in Neural Information Processing Systems, 2024

  10. [10]

    Accurate predictions on small data with a tabular foundation model

    Noah Hollmann, Samuel Muller, Lennart Purucker, Arjun Krishnakumar, Max Korfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025

  11. [11]

    Overcoming catastrophic forgetting in neural networks

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114 0 (13): 0 3521--3526, 2017

  12. [12]

    Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, and Hyunwoo J. Kim. Advancing bayesian optimization via learning correlated latent space. In Advances in Neural Information Processing Systems, 2023

  13. [13]

    Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, and Hyunwoo J. Kim. Latent bayesian optimization via autoregressive normalizing flows. In International Conference on Learning Representations, 2025

  14. [14]

    None to optima in few shots: Bayesian optimization with mdp priors

    Diantong Li, Kyunghyun Cho, and Chong Liu. None to optima in few shots: Bayesian optimization with mdp priors. arXiv preprint arXiv:2511.01006, 2025

  15. [15]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016

  16. [16]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

  17. [17]

    End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes

    Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to-end Meta-Bayesian Optimisation with Transformer Neural Processes . Advances in Neural Information Processing Systems , 2023

  18. [18]

    Kusner, John Bradshaw, and Jacob Gardner

    Natalie Maus, Haydn Jones, Juston Moore, Matt J. Kusner, John Bradshaw, and Jacob Gardner. Local latent space bayesian optimization over structured inputs. In Advances in Neural Information Processing Systems, 2022

  19. [19]

    Natalie Maus, Yimeng Zeng, Haydn Thomas Jones, Yining Huang, Gaurav Ng Goel, Alden Rose, Kyurae Kim, Hyun-Su Lee, Marcelo Der Torossian Torres, Fangping Wan, Cesar de la Fuente-Nunez, Mark Yatskar, Osbert Bastani, and Jacob R. Gardner. Purely agentic black-box optimization for biological design. arXiv preprint arXiv:2601.22382, 2026

  20. [20]

    Moss, Sebastian W

    Henry B. Moss, Sebastian W. Ober, and Tom Diethe. Return of the latent space cowboys: Re-thinking the use of vaes for bayesian optimisation of structured spaces. In International Conference on Machine Learning, 2025

  21. [21]

    Transformers can do bayesian inference

    Samuel Muller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. arXiv preprint arXiv:2112.10510, 2021

  22. [22]

    PFN s4 BO : In-context learning for B ayesian optimization

    Samuel Muller, Matthias Feurer, Noah Hollmann, and Frank Hutter. PFN s4 BO : In-context learning for B ayesian optimization. In International Conference on Machine Learning, 2023

  23. [23]

    Position: The future of bayesian prediction is prior-fitted

    Samuel Muller, Arik Reuter, Noah Hollmann, David R \"u gamer, and Frank Hutter. Position: The future of bayesian prediction is prior-fitted. arXiv preprint arXiv:2505.23947, 2025

  24. [24]

    Tabiclv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139, 2026

    Jingang Qu, David Holzm \"u ller, Ga \"e l Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model. arXiv preprint arXiv:2602.11139, 2026

  25. [25]

    Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006

  26. [26]

    Accelerating bayesian optimization for biological sequence design with denoising autoencoders

    Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, and Andrew Gordon Wilson. Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, 2022

  27. [27]

    Sample-efficient optimization in the latent space of deep generative models via weighted retraining

    Austin Tripp, Erik Daxberger, and Jose Miguel Hernandez-Lobato. Sample-efficient optimization in the latent space of deep generative models via weighted retraining. In Advances in Neural Information Processing Systems, 2020

  28. [28]

    -pfn: In-context learning entropy search

    Tom Viering, Steven Adriaensen, Herilalaina Rakotoarison, Samuel Muller, Carl Hvarfner, Frank Hutter, and Eytan Bakshy. -pfn: In-context learning entropy search. In ICLR Workshop on Frontiers in Probabilistic Inference, 2025

  29. [29]

    a hdesm \

    Tuan A. Vu, Julien Martinelli, and Harri L \"a hdesm \"a ki. Time-aware latent space bayesian optimization. arXiv preprint arXiv:2603.00935, 2026

  30. [30]

    Explicit inductive bias for transfer learning with convolutional networks

    Li Xuhong, Yves Grandvalet, and Franck Davoine. Explicit inductive bias for transfer learning with convolutional networks. In International conference on machine learning, pages 2825--2834. PMLR, 2018

  31. [31]

    GIT - BO : High-dimensional bayesian optimization with tabular foundation models

    Rosen Ting-Ying Yu, Cyril Picard, and Faez Ahmed. GIT - BO : High-dimensional bayesian optimization with tabular foundation models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=9iTdKS4SRQ

  32. [32]

    Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a

    Qiong Zhang, Yan Shuo Tan, Qinglong Tian, and Pengfei Li. Tabpfn: One model to rule them all? arXiv preprint arXiv:2505.20003, 2025 a

  33. [33]

    PABBO: Preferential Amortized Black-Box Optimization

    Xinyu Zhang, Daolang Huang, Samuel Kaski, and Julien Martinelli. PABBO: Preferential Amortized Black-Box Optimization . International Conference on Learning Representations , 2025 b

  34. [34]

    In-context multi-objective optimization

    Xinyu Zhang, Conor Hassan, Julien Martinelli, Daolang Huang, and Samuel Kaski. In-context multi-objective optimization. In International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=odmeUlWta8