pith. sign in

arxiv: 2606.25197 · v1 · pith:GDG2BMOSnew · submitted 2026-06-23 · 💻 cs.LG · stat.ML

Efficient Adaptive Data Acquisition via Pretrained Belief Representations

Pith reviewed 2026-06-25 23:35 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords amortised policy learningBayesian experimental designbelief representationspretrained modelsactive learningBayesian optimisation
0
0 comments X

The pith

POLAR learns data acquisition policies by training lightweight heads on belief states from pretrained predictive models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that adaptive data acquisition policies depend on observation history only through a sufficient belief state, which can be extracted directly from existing pretrained predictive foundation models. By freezing those models as encoders and training only a policy head on top, representation learning is decoupled from policy learning. This produces a single amortised framework that covers Bayesian experimental design, Bayesian optimisation and active learning, differing solely in the utility used for training. The resulting policies outperform prior amortised methods while using far fewer training samples.

Core claim

POLAR uses pretrained predictive foundation models as fixed belief-state encoders and trains a task-specific policy head on their representations. Optimal data acquisition then reduces to learning this lightweight head via the appropriate utility, without needing to learn representations or approximate posteriors from scratch.

What carries the argument

Pretrained predictive foundation models acting as belief-state encoders, with a trainable policy head placed on top that is optimised for a chosen utility function.

If this is right

  • A single training loop produces policies for Bayesian experimental design, Bayesian optimisation and active learning; only the scalar utility changes between tasks.
  • Amortised data-acquisition methods become practical with far smaller policy-training budgets than direct policy learning or surrogate-based baselines.
  • Existing large predictive models can be reused as black-box belief encoders without retraining their weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same encoder-plus-head pattern could be tested on other history-dependent sequential decisions such as reinforcement learning or sequential experimental design outside the Bayesian setting.
  • If the assumption holds, further scaling of general predictive foundation models would automatically improve amortised data acquisition without additional task-specific representation work.
  • One could measure how much additional gain comes from light fine-tuning of the encoder versus keeping it completely frozen.

Load-bearing premise

The internal representations produced by existing pretrained predictive foundation models already contain a sufficient statistic for the optimal data-acquisition policy.

What would settle it

A concrete counter-example would be any data-acquisition task in which a policy head trained on the frozen foundation-model representations produces acquisition behaviour that is measurably worse than a jointly trained representation-plus-policy model or an exact posterior-based method.

Figures

Figures reproduced from arXiv: 2606.25197 by Conor Hassan, Daolang Huang, Luigi Acerbi, Samuel Kaski, Tom Rainforth, Zhuoyue Huang.

Figure 1
Figure 1. Figure 1: Overview of POLAR. Left: POLAR uses a pretrained tabular foundation model as a belief encoder and trains a policy head on top of it. Top right: Policy learning is driven by task-specific utilities, while backbone adaptation is supervised by an optional prediction loss. Bottom right: At deployment, the policy maps the current history to the next design in a single forward pass. architectures that exploit in… view at source ↗
Figure 2
Figure 2. Figure 2: Location finding. (a) EIG against total training samples in the 2D setting. Error bars denote standard error across 1000 runs. (b) The same comparison in the 5D setting. (c) Example design trajectories for all methods on a shared latent source configuration. evolved into Transformer Neural Processes [50] and early PFNs [48]. Recent tabular foundation models such as TabPFN [26, 27, 23] and TabICL [51, 52] s… view at source ↗
Figure 3
Figure 3. Figure 3: Ablations in the 2D setting, isolating the impact of our core architectural choices, including gradient decoupling, backbone finetuning and ini￾tialisation. Frozen vs. finetuned backbone. We first compare our default setting, in which the backbone is adapted by the supervised pre￾diction loss, against a frozen-backbone variant. Even without any weight updates, the frozen variant remains competitive with th… view at source ↗
Figure 4
Figure 4. Figure 4: Hyperparameter optimisation on HPO-B. Average regret (left) and average rank (right) aggregated across the six search spaces. Shaded regions denote one standard error across the test tasks. POLAR achieves the lowest regret and the best average rank throughout the acquisition trajectory. of hyperparameter configurations and datasets. HPO-B is pre-partitioned into multiple search spaces, each corresponding t… view at source ↗
Figure 5
Figure 5. Figure 5: DOCKSTRING. Average regret (mean ± s.e.) across the six held-out targets. Finally, we evaluate our method on a real-world high￾dimensional molecular optimisation benchmark, DOCK￾STRING [17], which provides docking scores for over 260,000 molecules against a panel of protein targets. Since the physic￾ochemical interactions that govern binding affinity are inher￾ently correlated across these structurally rel… view at source ↗
read the original abstract

Learning effective policies for adaptive data acquisition remains challenging: posterior-based methods rely on surrogate models and posterior approximations that can be misspecified or biased, while direct policy-learning methods map from historical observations and fail to exploit available model representations, making learning harder. We introduce policy learning with belief representations (POLAR), based on the insight that optimal data acquisition depends on the observation history only through a sufficient belief state. Specifically, POLAR decouples representation learning from policy learning by leveraging pretrained predictive foundation models as belief-state encoders, training a policy head on top of their representations. This yields a simple, unified amortised policy learning framework for Bayesian experimental design, Bayesian optimisation, and active learning, differing only in the task-specific utility used to train the policy. Empirically, we find that POLAR outperforms state-of-the-art amortised methods across diverse tasks while requiring far fewer training samples, demonstrating a significant step in the scalability and efficiency of amortised data acquisition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces POLAR, an amortised policy-learning framework for Bayesian experimental design, Bayesian optimisation and active learning. It decouples representation learning from policy learning by using pretrained predictive foundation models as fixed belief-state encoders and training only a lightweight policy head on top of those representations; the framework is claimed to differ across tasks only in the choice of utility used to train the policy head. The central empirical claim is that POLAR outperforms existing amortised baselines across diverse tasks while requiring substantially fewer training samples.

Significance. If the sufficiency of the pretrained representations for the relevant utilities is established and the reported gains survive proper controls, the work would constitute a meaningful advance in the scalability of amortised acquisition policies by removing the need to learn task-specific representations from scratch.

major comments (2)
  1. [Introduction and §3 (Method)] The central claim rests on the assumption that representations extracted from existing pretrained predictive foundation models already constitute a sufficient statistic for the optimal data-acquisition policy (i.e., that all information in the observation history relevant to the task-specific utility is preserved). Predictive pretraining objectives do not guarantee retention of posterior uncertainty or value-of-information quantities; without a diagnostic, theorem, or controlled ablation demonstrating sufficiency for the reported tasks, the claimed benefit of decoupling representation and policy learning cannot be evaluated. This issue is load-bearing for the entire contribution.
  2. [Abstract] The abstract asserts empirical outperformance and reduced sample complexity, yet supplies no quantitative results, error bars, baseline descriptions, or details on how belief representations are extracted and validated. Even if the full experimental section contains these elements, the absence of any verification that the foundation-model encoder preserves the quantities driving the utilities undermines the cross-task claims.
minor comments (1)
  1. [§3] Notation for the belief state versus raw observation history should be made explicit and consistent throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and describe the revisions we will make.

read point-by-point responses
  1. Referee: [Introduction and §3 (Method)] The central claim rests on the assumption that representations extracted from existing pretrained predictive foundation models already constitute a sufficient statistic for the optimal data-acquisition policy (i.e., that all information in the observation history relevant to the task-specific utility is preserved). Predictive pretraining objectives do not guarantee retention of posterior uncertainty or value-of-information quantities; without a diagnostic, theorem, or controlled ablation demonstrating sufficiency for the reported tasks, the claimed benefit of decoupling representation and policy learning cannot be evaluated. This issue is load-bearing for the entire contribution.

    Authors: We agree that a formal theorem establishing sufficiency is not provided and that predictive pretraining does not explicitly optimize for posterior uncertainty or value-of-information. The manuscript instead relies on empirical evidence that the fixed pretrained encoders enable strong policy performance across tasks while reducing sample complexity relative to methods that learn representations jointly. To directly address the concern, the revised manuscript will add a new subsection containing controlled ablations (pretrained vs. randomly initialized encoders, and vs. task-specific representation learning) together with simple diagnostics that measure how well the representations preserve quantities relevant to the utilities. revision: yes

  2. Referee: [Abstract] The abstract asserts empirical outperformance and reduced sample complexity, yet supplies no quantitative results, error bars, baseline descriptions, or details on how belief representations are extracted and validated. Even if the full experimental section contains these elements, the absence of any verification that the foundation-model encoder preserves the quantities driving the utilities undermines the cross-task claims.

    Authors: Abstracts are written to be concise; all quantitative results, error bars, baseline descriptions, and implementation details on representation extraction appear in the experimental sections of the full manuscript. The requested verification of the encoder is precisely the content of the new ablation and diagnostic subsection described in the response to the first comment, which will be added in revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical amortised framework with no self-referential derivation

full rationale

The paper presents POLAR as a practical method that trains a lightweight policy head atop fixed pretrained foundation-model encoders. No equations, uniqueness theorems, or derivation steps are supplied that reduce a claimed prediction or result to a fitted quantity or self-citation by construction. The central modelling choice (belief-state sufficiency) is an empirical modelling assumption rather than a tautological redefinition, and the reported gains are obtained from task-specific training and evaluation on held-out data. This is the normal non-circular outcome for an empirical amortised framework.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that pretrained foundation-model representations already encode sufficient belief states. No free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.1-grok · 5703 in / 1277 out tokens · 16023 ms · 2026-06-25T23:35:18.992542+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 18 canonical work pages · 6 internal anchors

  1. [1]

    P., Jomaa, H

    Arango, S. P., Jomaa, H. S., Wistuba, M., and Grabocka, J. (2021). Hpo-b: A large-scale reproducible benchmark for black-box hpo based on openml. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 8, 18, 19

  2. [2]

    J., Chenery, H

    Arrow, K. J., Chenery, H. B., Minhas, B. S., and Solow, R. M. (1961). Capital-labor substitution and economic efficiency.The review of Economics and Statistics, pages 225–250. 7, 16

  3. [3]

    Berger, J. O. (1985). Statistical decision theory and bayesian analysis.Springer Series in Statistics. 3

  4. [4]

    Bickford Smith, F., Kossen, J., Trollope, E., Van Der Wilk, M., Foster, A., and Rainforth, T. (2025). Rethinking aleatoric and epistemic uncertainty. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 4345–4359. PMLR. 4

  5. [5]

    V ., Chades, I., and Dezfouli, A

    Blau, T., Bonilla, E. V ., Chades, I., and Dezfouli, A. (2022). Optimizing sequential experimental design with deep reinforcement learning. InInternational conference on machine learning, pages 2107–2128. PMLR. 3, 6, 7, 17 10

  6. [6]

    Blau, T., Chades, I., Dezfouli, A., Steinberg, D., and Bonilla, E. V . (2023). Statistically efficient bayesian sequential experiment design via reinforcement learning with cross-entropy estimators. arXiv preprint arXiv:2305.18435. 3, 6

  7. [7]

    R., Intes, X., Bürkner, P.-C., and Radev, S

    Bracher, N., Kühmichel, L., Ivanova, D. R., Intes, X., Bürkner, P.-C., and Radev, S. T. (2025). Jadai: Jointly amortizing adaptive design and bayesian inference.arXiv preprint arXiv:2512.22999. 3, 6

  8. [8]

    and Verdinelli, I

    Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review.Statistical science, pages 273–304. 3

  9. [9]

    E., Loka, N

    Chang, P. E., Loka, N. R. B. S., Huang, D., Remes, U., Kaski, S., and Acerbi, L. (2025). Amortized probabilistic conditioning for optimization, simulation and inference. InInternational Conference on Artificial Intelligence and Statistics, pages 703–711. PMLR. 1, 3, 6

  10. [10]

    W., Colmenarejo, S

    Chen, Y ., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and Freitas, N. (2017). Learning to learn without gradient descent by gradient descent. InInternational Conference on Machine Learning, pages 748–756. PMLR. 6

  11. [11]

    Chen, Y ., Song, X., Lee, C., Wang, Z., Zhang, R., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., et al. (2022). Towards learning universal hyperparameter optimizers with transformers.Advances in Neural Information Processing Systems, 35:32053–32068. 6

  12. [12]

    I., Lyu, W., Tutunov, R., Wang, Z., Grosnit, A., Griffiths, R

    Cowen-Rivers, A. I., Lyu, W., Tutunov, R., Wang, Z., Grosnit, A., Griffiths, R. R., Maraval, A. M., Jianye, H., Wang, J., Peters, J., et al. (2022). Hebo: Pushing the limits of sample-efficient hyper-parameter optimisation.Journal of Artificial Intelligence Research, 74:1269–1349. 19

  13. [13]

    Dawid, A. P. (1998). Coherent measures of discrepancy, uncertainty and dependence, with appli- cations to bayesian predictive experimental design.Department of Statistical Science, University College London. http://www. ucl. ac. uk/Stats/research/abs94. html, Tech. Rep, 139. 4

  14. [14]

    R., Malik, I., and Rainforth, T

    Foster, A., Ivanova, D. R., Malik, I., and Rainforth, T. (2021). Deep adaptive design: Amortizing sequential bayesian experimental design. InInternational conference on machine learning, pages 3384–3395. PMLR. 1, 3, 5, 6, 7, 16, 17, 18

  15. [15]

    W., Rainforth, T., and Goodman, N

    Foster, A., Jankowiak, M., Bingham, E., Horsfall, P., Teh, Y . W., Rainforth, T., and Goodman, N. (2019). Variational bayesian optimal experimental design. InAdvances in Neural Information Processing Systems, volume 32. 7, 16

  16. [16]

    W., and Rainforth, T

    Foster, A., Jankowiak, M., O’Meara, M., Teh, Y . W., and Rainforth, T. (2020). A unified stochas- tic gradient approach to designing bayesian-optimal experiments. InInternational Conference on Artificial Intelligence and Statistics, pages 2959–2969. PMLR. 7

  17. [17]

    N., Tripp, A

    García-Ortegón, M., Simm, G. N., Tripp, A. J., Hernández-Lobato, J. M., Bender, A., and Bacallado, S. (2022). Dockstring: easy molecular docking yields better benchmarks for ligand design.Journal of chemical information and modeling, 62(15):3486–3502. 9, 20

  18. [18]

    Garg, A., Ali, M., Hollmann, N., Purucker, L., Müller, S., and Hutter, F. (2025). Real-tabpfn: Improving tabular foundation models via continued pre-training with real-world data.arXiv preprint arXiv:2507.03971. 7

  19. [19]

    W., Rezende, D., and Eslami, S

    Garnelo, M., Rosenbaum, D., Maddison, C., Ramalho, T., Saxton, D., Shanahan, M., Teh, Y . W., Rezende, D., and Eslami, S. A. (2018). Conditional neural processes. InInternational conference on machine learning, pages 1704–1713. PMLR. 6

  20. [20]

    (2023).Bayesian optimization

    Garnett, R. (2023).Bayesian optimization. Cambridge University Press. 1, 3

  21. [21]

    and Raftery, A

    Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378. 4

  22. [22]

    Griffiths, R.-R., Klarner, L., Moss, H., Ravuri, A., Truong, S., Du, Y ., Stanton, S., Tom, G., Rankovic, B., Jamasb, A., et al. (2023). Gauche: a library for gaussian processes in chemistry. Advances in Neural Information Processing Systems, 36:76923–76946. 10, 20 11

  23. [23]

    Grinsztajn, L., Flöge, K., Key, O., Birkel, F., Jund, P., Roof, B., Jäger, B., Safaric, D., Alessi, S., Hayler, A., et al. (2025). Tabpfn-2.5: Advancing the state of the art in tabular foundation models. arXiv preprint arXiv:2511.08667. 2, 3, 5, 7, 15

  24. [24]

    Guo, Y ., Huang, D., Zhang, X., Katt, S., Kaski, S., and Bharti, A. (2026). Constrained bayesian experimental design via online planning.arXiv preprint arXiv:2605.26990. 6

  25. [25]

    R., Guan, C., and Rainforth, T

    Hedman, M., Ivanova, D. R., Guan, C., and Rainforth, T. (2025). Step-dad: Semi-amortized policy-based bayesian experimental design. InInternational Conference on Machine Learning, pages 22904–22923. PMLR. 6

  26. [26]

    Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. (2023). Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations. 6, 7

  27. [27]

    B., Schirrmeister, R

    Hollmann, N., Müller, S., Purucker, L., Krishnakumar, A., Körfer, M., Hoo, S. B., Schirrmeister, R. T., and Hutter, F. (2025). Accurate predictions on small data with a tabular foundation model. Nature, 637(8045):319–326. 2, 3, 5, 7

  28. [28]

    B., Müller, S., Salinas, D., and Hutter, F

    Hoo, S. B., Müller, S., Salinas, D., and Hutter, F. (2024). The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features. InNeurIPS workshop on time series in the age of large models. 7

  29. [29]

    Houlsby, N., Huszár, F., Ghahramani, Z., and Lengyel, M. (2011). Bayesian active learning for classification and preference learning.arXiv preprint arXiv:1112.5745. 4, 21

  30. [30]

    J., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W., et al

    Hu, E. J., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W., et al. (2022). Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Representations. 6

  31. [31]

    Sequential Bayesian optimal experimental design via approximate dynamic programming

    Huan, X. and Marzouk, Y . M. (2016). Sequential bayesian optimal experimental design via approximate dynamic programming.arXiv preprint arXiv:1604.08320. 1

  32. [32]

    Huang, D., Guo, Y ., Acerbi, L., and Kaski, S. (2024). Amortized bayesian experimental design for decision-making.Advances in Neural Information Processing Systems, 37:109460–109486. 1, 3, 6

  33. [33]

    Huang, D., Wen, X., Bharti, A., Kaski, S., and Acerbi, L. (2026a). Aline: Joint amortization for bayesian inference and active data acquisition.Advances in Neural Information Processing Systems, 38:54068–54102. 3, 6, 7, 16, 17, 21

  34. [34]

    B., and Rainforth, T

    Huang, Z., Smith, F. B., and Rainforth, T. (2026b). Loss-driven bayesian active learning. In The 29th International Conference on Artificial Intelligence and Statistics. 3, 5, 21, 22, 23

  35. [35]

    H., Lin, K.-J., Lin, Y .-H., Wang, C.-Y ., Sun, C., and Hsieh, P.-C

    Hung, Y . H., Lin, K.-J., Lin, Y .-H., Wang, C.-Y ., Sun, C., and Hsieh, P.-C. (2025). Boformer: Learning to solve multi-objective bayesian optimization via non-markovian rl. InThe Thirteenth International Conference on Learning Representations. 6

  36. [36]

    (2025).Efficient Bayesian Experimental Design with Deep Learning

    Igoe, C. (2025).Efficient Bayesian Experimental Design with Deep Learning. PhD thesis, Carnegie Mellon University. 1

  37. [37]

    Iqbal, S., Corenflos, A., Särkkä, S., and Abdulsamad, H. (2024). Nesting particle filters for experimental design in dynamical systems. InInternational Conference on Machine Learning, pages 21047–21068. PMLR. 6

  38. [38]

    R., Foster, A., Kleinegesse, S., Gutmann, M

    Ivanova, D. R., Foster, A., Kleinegesse, S., Gutmann, M. U., and Rainforth, T. (2021). Implicit deep adaptive design: Policy-based experimental design without likelihoods.Advances in neural information processing systems, 34:25785–25798. 1, 3, 6, 16

  39. [39]

    Lacoste-Julien, S., Huszár, F., and Ghahramani, Z. (2011). Approximate inference for the loss-calibrated bayesian. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 416–424. JMLR Workshop and Conference Proceedings. 1, 3 12

  40. [40]

    Li, C.-Y ., Toussaint, M., Rakitsch, B., and Zimmer, C. (2025a). Amortized safe active learning for real-time data acquisition: Pretrained neural policies from simulated nonparametric functions. arXiv preprint arXiv:2501.15458. 3

  41. [41]

    Li, D., Cho, K., and Liu, C. (2025b). None to optima in few shots: Bayesian optimization with mdp priors.arXiv preprint arXiv:2511.01006. 1, 3, 6, 9

  42. [42]

    Lim, V ., Novoseller, E., Ichnowski, J., Huang, H., and Goldberg, K. (2022). Policy- based bayesian experimental design for non-differentiable implicit models.arXiv preprint arXiv:2203.04272. 6

  43. [43]

    Lindley, D. V . (1956). On a measure of the information provided by an experiment.The Annals of Mathematical Statistics, 27(4):986–1005. 3, 4

  44. [44]

    Lindley, D. V . (1972).Bayesian statistics: A review. SIAM. 3, 5

  45. [45]

    Maraval, A., Zimmer, M., Grosnit, A., and Bou Ammar, H. (2023). End-to-end meta-bayesian optimisation with transformer neural processes.Advances in Neural Information Processing Systems, 36:11246–11260. 1, 3, 6, 9, 18, 19

  46. [46]

    P., and Gardner, J

    Maus, N., Kim, K., Pleiss, G., Eriksson, D., Cunningham, J. P., and Gardner, J. R. (2024). Approximation-aware bayesian optimization.Advances in Neural Information Processing Systems, 37:21114–21140. 1, 3

  47. [47]

    Müller, S., Feurer, M., Hollmann, N., and Hutter, F. (2023). Pfns4bo: In-context learning for bayesian optimization. InInternational Conference on Machine Learning, pages 25444–25470. PMLR. 3, 6, 9, 19

  48. [48]

    P., Grabocka, J., and Hutter, F

    Müller, S., Hollmann, N., Arango, S. P., Grabocka, J., and Hutter, F. (2021). Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510. 3, 7, 9, 19

  49. [49]

    Müller, S., Reuter, A., Hollmann, N., Rügamer, D., and Hutter, F. (2025). Position: The future of bayesian prediction is prior-fitted. InInternational Conference on Machine Learning, pages 81861–81875. PMLR. 2, 7

  50. [50]

    and Grover, A

    Nguyen, T. and Grover, A. (2022). Transformer neural processes: Uncertainty-aware meta learning via sequence modeling. InInternational Conference on Machine Learning, pages 16569– 16594. PMLR. 3, 7, 9, 19

  51. [51]

    Qu, J., Holzmüller, D., Varoquaux, G., and Morvan, M. L. (2025). Tabicl: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564. 2, 3, 5, 7, 15

  52. [52]

    Qu, J., Holzmüller, D., Varoquaux, G., and Morvan, M. L. (2026). Tabiclv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139. 2, 3, 6, 7, 15, 19

  53. [53]

    R., and Bickford Smith, F

    Rainforth, T., Foster, A., Ivanova, D. R., and Bickford Smith, F. (2024). Modern bayesian experimental design.Statistical Science, 39(1):100–114. 1, 3, 4

  54. [54]

    Rasmussen, C. E. (2003). Gaussian processes in machine learning. InSummer school on machine learning, pages 63–71. Springer. 9

  55. [55]

    Rubachev, I., Kotelnikov, A., Kartashev, N., and Babenko, A. (2025). On finetuning tabular foundation models.arXiv preprint arXiv:2506.08982. 7

  56. [56]

    G., Drovandi, C

    Ryan, E. G., Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2016). A review of modern computational algorithms for bayesian optimal design.International Statistical Review, 84(1):128–

  57. [57]

    Savage, L. J. (1951). The theory of statistical decision.Journal of the American Statistical association, 46(253):55–67. 3

  58. [58]

    Savage, L. J. (1971). Elicitation of personal probabilities and expectations.Journal of the American Statistical Association, 66(336):783–801. 4

  59. [59]

    (2012).Active learning

    Settles, B. (2012).Active learning. Morgan & Claypool Publishers. 1 13

  60. [60]

    Shen, W., Dong, J., and Huan, X. (2025). Variational sequential optimal experimental de- sign using reinforcement learning.Computer Methods in Applied Mechanics and Engineering, 444:118068. 6

  61. [61]

    and Hu, Y .-H

    Sheng, X. and Hu, Y .-H. (2005). Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks.IEEE transactions on signal processing, 53(1):44–53. 7, 16

  62. [62]

    B., Kirsch, A., Farquhar, S., Gal, Y ., Foster, A., and Rainforth, T

    Smith, F. B., Kirsch, A., Farquhar, S., Gal, Y ., Foster, A., and Rainforth, T. (2023). Prediction- oriented bayesian active learning. InInternational conference on artificial intelligence and statistics, pages 7331–7348. PMLR. 3, 4, 5, 21

  63. [63]

    Song, L., Gao, C., Xue, K., Wu, C., Li, D., Hao, J., Zhang, Z., and Qian, C. (2024). Reinforced in-context black-box optimization.arXiv preprint arXiv:2402.17423. 6

  64. [64]

    C., Sheridan, R

    Svetnik, V ., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., and Feuston, B. P. (2003). Random forest: a classification and regression tool for compound classification and qsar modeling. Journal of chemical information and computer sciences, 43(6):1947–1958. 10, 20

  65. [65]

    Tanna, A., Seth, P., Bouadi, M., Avaiya, U., and Sankarapu, V . K. (2025). Tabtune: A unified library for inference and fine-tuning tabular foundation models.arXiv preprint arXiv:2511.02802. 7

  66. [66]

    and Olson, A

    Trott, O. and Olson, A. J. (2010). Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.Journal of computational chemistry, 31(2):455–461. 20

  67. [67]

    J., Adriaensen, S., Rakotoarison, H., Müller, S., Hvarfner, C., Hutter, F., and Bakshy, E

    Viering, T. J., Adriaensen, S., Rakotoarison, H., Müller, S., Hvarfner, C., Hutter, F., and Bakshy, E. (2025). α-pfn: In-context learning entropy search. InFrontiers in Probabilistic Inference: Learning meets Sampling. 6

  68. [68]

    Yang, K., Swanson, K., Jin, W., Coley, C., Eiden, P., Gao, H., Guzman-Perez, A., Hopper, T., Kelley, B., Mathea, M., et al. (2019). Analyzing learned molecular representations for property prediction.Journal of chemical information and modeling, 59(8):3370–3388. 10, 20

  69. [69]

    Ye, H.-J., Liu, S.-Y ., and Chao, W.-L. (2025). A closer look at tabpfn v2: Understanding its strengths and extending its capabilities.arXiv preprint arXiv:2502.17361. 7

  70. [70]

    S., Tian, Q., and Li, P

    Zhang, Q., Tan, Y . S., Tian, Q., and Li, P. (2025a). Tabpfn: One model to rule them all?arXiv preprint arXiv:2505.20003. 7

  71. [71]

    Zhang, X., Hassan, C., Martinelli, J., Huang, D., and Kaski, S. (2026). In-context multi-objective optimization. InThe Fourteenth International Conference on Learning Representations. 3, 6

  72. [72]

    log p(hT |θ 0, π) 1 L+1 PL ℓ=0 p(hT |θ ℓ, π) # ,U T (π, L) =E

    Zhang, X., Huang, D., Kaski, S., and Martinelli, J. (2025b). Pabbo: Preferential amortized black-box optimization. InThe Thirteenth International Conference on Learning Representations. 6 14 Appendix The appendix is organized as follows: • In Appendix A, we provide additional details of POLAR, including the backbone architec- tures, the policy head, and t...

  73. [73]

    Drawing a kernel uniformly from {RBF, Matérn-3/2, Matérn-5/2}

  74. [74]

    Drawing a length-scale ℓ∼LogUniform(0.1,2.0) and an output scale σf ∼ Uniform(0.1,1.0)

  75. [75]

    Sampling a functionf∼ GP(0, k ℓ,σf ). Before each forward pass, we normalise each function using task-level statistics computed indepen- dently of the observed context: for inputs, we use the known domain bounds, and for outputs, we estimate the normalisation mean and variance from a large set of reference points sampled from that function. The horizon is...