Efficient Adaptive Data Acquisition via Pretrained Belief Representations
Pith reviewed 2026-06-25 23:35 UTC · model grok-4.3
The pith
POLAR learns data acquisition policies by training lightweight heads on belief states from pretrained predictive models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
POLAR uses pretrained predictive foundation models as fixed belief-state encoders and trains a task-specific policy head on their representations. Optimal data acquisition then reduces to learning this lightweight head via the appropriate utility, without needing to learn representations or approximate posteriors from scratch.
What carries the argument
Pretrained predictive foundation models acting as belief-state encoders, with a trainable policy head placed on top that is optimised for a chosen utility function.
If this is right
- A single training loop produces policies for Bayesian experimental design, Bayesian optimisation and active learning; only the scalar utility changes between tasks.
- Amortised data-acquisition methods become practical with far smaller policy-training budgets than direct policy learning or surrogate-based baselines.
- Existing large predictive models can be reused as black-box belief encoders without retraining their weights.
Where Pith is reading between the lines
- The same encoder-plus-head pattern could be tested on other history-dependent sequential decisions such as reinforcement learning or sequential experimental design outside the Bayesian setting.
- If the assumption holds, further scaling of general predictive foundation models would automatically improve amortised data acquisition without additional task-specific representation work.
- One could measure how much additional gain comes from light fine-tuning of the encoder versus keeping it completely frozen.
Load-bearing premise
The internal representations produced by existing pretrained predictive foundation models already contain a sufficient statistic for the optimal data-acquisition policy.
What would settle it
A concrete counter-example would be any data-acquisition task in which a policy head trained on the frozen foundation-model representations produces acquisition behaviour that is measurably worse than a jointly trained representation-plus-policy model or an exact posterior-based method.
Figures
read the original abstract
Learning effective policies for adaptive data acquisition remains challenging: posterior-based methods rely on surrogate models and posterior approximations that can be misspecified or biased, while direct policy-learning methods map from historical observations and fail to exploit available model representations, making learning harder. We introduce policy learning with belief representations (POLAR), based on the insight that optimal data acquisition depends on the observation history only through a sufficient belief state. Specifically, POLAR decouples representation learning from policy learning by leveraging pretrained predictive foundation models as belief-state encoders, training a policy head on top of their representations. This yields a simple, unified amortised policy learning framework for Bayesian experimental design, Bayesian optimisation, and active learning, differing only in the task-specific utility used to train the policy. Empirically, we find that POLAR outperforms state-of-the-art amortised methods across diverse tasks while requiring far fewer training samples, demonstrating a significant step in the scalability and efficiency of amortised data acquisition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces POLAR, an amortised policy-learning framework for Bayesian experimental design, Bayesian optimisation and active learning. It decouples representation learning from policy learning by using pretrained predictive foundation models as fixed belief-state encoders and training only a lightweight policy head on top of those representations; the framework is claimed to differ across tasks only in the choice of utility used to train the policy head. The central empirical claim is that POLAR outperforms existing amortised baselines across diverse tasks while requiring substantially fewer training samples.
Significance. If the sufficiency of the pretrained representations for the relevant utilities is established and the reported gains survive proper controls, the work would constitute a meaningful advance in the scalability of amortised acquisition policies by removing the need to learn task-specific representations from scratch.
major comments (2)
- [Introduction and §3 (Method)] The central claim rests on the assumption that representations extracted from existing pretrained predictive foundation models already constitute a sufficient statistic for the optimal data-acquisition policy (i.e., that all information in the observation history relevant to the task-specific utility is preserved). Predictive pretraining objectives do not guarantee retention of posterior uncertainty or value-of-information quantities; without a diagnostic, theorem, or controlled ablation demonstrating sufficiency for the reported tasks, the claimed benefit of decoupling representation and policy learning cannot be evaluated. This issue is load-bearing for the entire contribution.
- [Abstract] The abstract asserts empirical outperformance and reduced sample complexity, yet supplies no quantitative results, error bars, baseline descriptions, or details on how belief representations are extracted and validated. Even if the full experimental section contains these elements, the absence of any verification that the foundation-model encoder preserves the quantities driving the utilities undermines the cross-task claims.
minor comments (1)
- [§3] Notation for the belief state versus raw observation history should be made explicit and consistent throughout.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below and describe the revisions we will make.
read point-by-point responses
-
Referee: [Introduction and §3 (Method)] The central claim rests on the assumption that representations extracted from existing pretrained predictive foundation models already constitute a sufficient statistic for the optimal data-acquisition policy (i.e., that all information in the observation history relevant to the task-specific utility is preserved). Predictive pretraining objectives do not guarantee retention of posterior uncertainty or value-of-information quantities; without a diagnostic, theorem, or controlled ablation demonstrating sufficiency for the reported tasks, the claimed benefit of decoupling representation and policy learning cannot be evaluated. This issue is load-bearing for the entire contribution.
Authors: We agree that a formal theorem establishing sufficiency is not provided and that predictive pretraining does not explicitly optimize for posterior uncertainty or value-of-information. The manuscript instead relies on empirical evidence that the fixed pretrained encoders enable strong policy performance across tasks while reducing sample complexity relative to methods that learn representations jointly. To directly address the concern, the revised manuscript will add a new subsection containing controlled ablations (pretrained vs. randomly initialized encoders, and vs. task-specific representation learning) together with simple diagnostics that measure how well the representations preserve quantities relevant to the utilities. revision: yes
-
Referee: [Abstract] The abstract asserts empirical outperformance and reduced sample complexity, yet supplies no quantitative results, error bars, baseline descriptions, or details on how belief representations are extracted and validated. Even if the full experimental section contains these elements, the absence of any verification that the foundation-model encoder preserves the quantities driving the utilities undermines the cross-task claims.
Authors: Abstracts are written to be concise; all quantitative results, error bars, baseline descriptions, and implementation details on representation extraction appear in the experimental sections of the full manuscript. The requested verification of the encoder is precisely the content of the new ablation and diagnostic subsection described in the response to the first comment, which will be added in revision. revision: partial
Circularity Check
No circularity: empirical amortised framework with no self-referential derivation
full rationale
The paper presents POLAR as a practical method that trains a lightweight policy head atop fixed pretrained foundation-model encoders. No equations, uniqueness theorems, or derivation steps are supplied that reduce a claimed prediction or result to a fitted quantity or self-citation by construction. The central modelling choice (belief-state sufficiency) is an empirical modelling assumption rather than a tautological redefinition, and the reported gains are obtained from task-specific training and evaluation on held-out data. This is the normal non-circular outcome for an empirical amortised framework.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
P., Jomaa, H
Arango, S. P., Jomaa, H. S., Wistuba, M., and Grabocka, J. (2021). Hpo-b: A large-scale reproducible benchmark for black-box hpo based on openml. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 8, 18, 19
2021
-
[2]
J., Chenery, H
Arrow, K. J., Chenery, H. B., Minhas, B. S., and Solow, R. M. (1961). Capital-labor substitution and economic efficiency.The review of Economics and Statistics, pages 225–250. 7, 16
1961
-
[3]
Berger, J. O. (1985). Statistical decision theory and bayesian analysis.Springer Series in Statistics. 3
1985
-
[4]
Bickford Smith, F., Kossen, J., Trollope, E., Van Der Wilk, M., Foster, A., and Rainforth, T. (2025). Rethinking aleatoric and epistemic uncertainty. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 4345–4359. PMLR. 4
2025
-
[5]
V ., Chades, I., and Dezfouli, A
Blau, T., Bonilla, E. V ., Chades, I., and Dezfouli, A. (2022). Optimizing sequential experimental design with deep reinforcement learning. InInternational conference on machine learning, pages 2107–2128. PMLR. 3, 6, 7, 17 10
2022
- [6]
-
[7]
R., Intes, X., Bürkner, P.-C., and Radev, S
Bracher, N., Kühmichel, L., Ivanova, D. R., Intes, X., Bürkner, P.-C., and Radev, S. T. (2025). Jadai: Jointly amortizing adaptive design and bayesian inference.arXiv preprint arXiv:2512.22999. 3, 6
-
[8]
and Verdinelli, I
Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review.Statistical science, pages 273–304. 3
1995
-
[9]
E., Loka, N
Chang, P. E., Loka, N. R. B. S., Huang, D., Remes, U., Kaski, S., and Acerbi, L. (2025). Amortized probabilistic conditioning for optimization, simulation and inference. InInternational Conference on Artificial Intelligence and Statistics, pages 703–711. PMLR. 1, 3, 6
2025
-
[10]
W., Colmenarejo, S
Chen, Y ., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and Freitas, N. (2017). Learning to learn without gradient descent by gradient descent. InInternational Conference on Machine Learning, pages 748–756. PMLR. 6
2017
-
[11]
Chen, Y ., Song, X., Lee, C., Wang, Z., Zhang, R., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., et al. (2022). Towards learning universal hyperparameter optimizers with transformers.Advances in Neural Information Processing Systems, 35:32053–32068. 6
2022
-
[12]
I., Lyu, W., Tutunov, R., Wang, Z., Grosnit, A., Griffiths, R
Cowen-Rivers, A. I., Lyu, W., Tutunov, R., Wang, Z., Grosnit, A., Griffiths, R. R., Maraval, A. M., Jianye, H., Wang, J., Peters, J., et al. (2022). Hebo: Pushing the limits of sample-efficient hyper-parameter optimisation.Journal of Artificial Intelligence Research, 74:1269–1349. 19
2022
-
[13]
Dawid, A. P. (1998). Coherent measures of discrepancy, uncertainty and dependence, with appli- cations to bayesian predictive experimental design.Department of Statistical Science, University College London. http://www. ucl. ac. uk/Stats/research/abs94. html, Tech. Rep, 139. 4
1998
-
[14]
R., Malik, I., and Rainforth, T
Foster, A., Ivanova, D. R., Malik, I., and Rainforth, T. (2021). Deep adaptive design: Amortizing sequential bayesian experimental design. InInternational conference on machine learning, pages 3384–3395. PMLR. 1, 3, 5, 6, 7, 16, 17, 18
2021
-
[15]
W., Rainforth, T., and Goodman, N
Foster, A., Jankowiak, M., Bingham, E., Horsfall, P., Teh, Y . W., Rainforth, T., and Goodman, N. (2019). Variational bayesian optimal experimental design. InAdvances in Neural Information Processing Systems, volume 32. 7, 16
2019
-
[16]
W., and Rainforth, T
Foster, A., Jankowiak, M., O’Meara, M., Teh, Y . W., and Rainforth, T. (2020). A unified stochas- tic gradient approach to designing bayesian-optimal experiments. InInternational Conference on Artificial Intelligence and Statistics, pages 2959–2969. PMLR. 7
2020
-
[17]
N., Tripp, A
García-Ortegón, M., Simm, G. N., Tripp, A. J., Hernández-Lobato, J. M., Bender, A., and Bacallado, S. (2022). Dockstring: easy molecular docking yields better benchmarks for ligand design.Journal of chemical information and modeling, 62(15):3486–3502. 9, 20
2022
- [18]
-
[19]
W., Rezende, D., and Eslami, S
Garnelo, M., Rosenbaum, D., Maddison, C., Ramalho, T., Saxton, D., Shanahan, M., Teh, Y . W., Rezende, D., and Eslami, S. A. (2018). Conditional neural processes. InInternational conference on machine learning, pages 1704–1713. PMLR. 6
2018
-
[20]
(2023).Bayesian optimization
Garnett, R. (2023).Bayesian optimization. Cambridge University Press. 1, 3
2023
-
[21]
and Raftery, A
Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378. 4
2007
-
[22]
Griffiths, R.-R., Klarner, L., Moss, H., Ravuri, A., Truong, S., Du, Y ., Stanton, S., Tom, G., Rankovic, B., Jamasb, A., et al. (2023). Gauche: a library for gaussian processes in chemistry. Advances in Neural Information Processing Systems, 36:76923–76946. 10, 20 11
2023
-
[23]
Grinsztajn, L., Flöge, K., Key, O., Birkel, F., Jund, P., Roof, B., Jäger, B., Safaric, D., Alessi, S., Hayler, A., et al. (2025). Tabpfn-2.5: Advancing the state of the art in tabular foundation models. arXiv preprint arXiv:2511.08667. 2, 3, 5, 7, 15
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Guo, Y ., Huang, D., Zhang, X., Katt, S., Kaski, S., and Bharti, A. (2026). Constrained bayesian experimental design via online planning.arXiv preprint arXiv:2605.26990. 6
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
R., Guan, C., and Rainforth, T
Hedman, M., Ivanova, D. R., Guan, C., and Rainforth, T. (2025). Step-dad: Semi-amortized policy-based bayesian experimental design. InInternational Conference on Machine Learning, pages 22904–22923. PMLR. 6
2025
-
[26]
Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. (2023). Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations. 6, 7
2023
-
[27]
B., Schirrmeister, R
Hollmann, N., Müller, S., Purucker, L., Krishnakumar, A., Körfer, M., Hoo, S. B., Schirrmeister, R. T., and Hutter, F. (2025). Accurate predictions on small data with a tabular foundation model. Nature, 637(8045):319–326. 2, 3, 5, 7
2025
-
[28]
B., Müller, S., Salinas, D., and Hutter, F
Hoo, S. B., Müller, S., Salinas, D., and Hutter, F. (2024). The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features. InNeurIPS workshop on time series in the age of large models. 7
2024
-
[29]
Houlsby, N., Huszár, F., Ghahramani, Z., and Lengyel, M. (2011). Bayesian active learning for classification and preference learning.arXiv preprint arXiv:1112.5745. 4, 21
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[30]
J., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W., et al
Hu, E. J., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W., et al. (2022). Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Representations. 6
2022
-
[31]
Sequential Bayesian optimal experimental design via approximate dynamic programming
Huan, X. and Marzouk, Y . M. (2016). Sequential bayesian optimal experimental design via approximate dynamic programming.arXiv preprint arXiv:1604.08320. 1
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[32]
Huang, D., Guo, Y ., Acerbi, L., and Kaski, S. (2024). Amortized bayesian experimental design for decision-making.Advances in Neural Information Processing Systems, 37:109460–109486. 1, 3, 6
2024
-
[33]
Huang, D., Wen, X., Bharti, A., Kaski, S., and Acerbi, L. (2026a). Aline: Joint amortization for bayesian inference and active data acquisition.Advances in Neural Information Processing Systems, 38:54068–54102. 3, 6, 7, 16, 17, 21
-
[34]
B., and Rainforth, T
Huang, Z., Smith, F. B., and Rainforth, T. (2026b). Loss-driven bayesian active learning. In The 29th International Conference on Artificial Intelligence and Statistics. 3, 5, 21, 22, 23
-
[35]
H., Lin, K.-J., Lin, Y .-H., Wang, C.-Y ., Sun, C., and Hsieh, P.-C
Hung, Y . H., Lin, K.-J., Lin, Y .-H., Wang, C.-Y ., Sun, C., and Hsieh, P.-C. (2025). Boformer: Learning to solve multi-objective bayesian optimization via non-markovian rl. InThe Thirteenth International Conference on Learning Representations. 6
2025
-
[36]
(2025).Efficient Bayesian Experimental Design with Deep Learning
Igoe, C. (2025).Efficient Bayesian Experimental Design with Deep Learning. PhD thesis, Carnegie Mellon University. 1
2025
-
[37]
Iqbal, S., Corenflos, A., Särkkä, S., and Abdulsamad, H. (2024). Nesting particle filters for experimental design in dynamical systems. InInternational Conference on Machine Learning, pages 21047–21068. PMLR. 6
2024
-
[38]
R., Foster, A., Kleinegesse, S., Gutmann, M
Ivanova, D. R., Foster, A., Kleinegesse, S., Gutmann, M. U., and Rainforth, T. (2021). Implicit deep adaptive design: Policy-based experimental design without likelihoods.Advances in neural information processing systems, 34:25785–25798. 1, 3, 6, 16
2021
-
[39]
Lacoste-Julien, S., Huszár, F., and Ghahramani, Z. (2011). Approximate inference for the loss-calibrated bayesian. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 416–424. JMLR Workshop and Conference Proceedings. 1, 3 12
2011
-
[40]
Li, C.-Y ., Toussaint, M., Rakitsch, B., and Zimmer, C. (2025a). Amortized safe active learning for real-time data acquisition: Pretrained neural policies from simulated nonparametric functions. arXiv preprint arXiv:2501.15458. 3
work page internal anchor Pith review Pith/arXiv arXiv
- [41]
- [42]
-
[43]
Lindley, D. V . (1956). On a measure of the information provided by an experiment.The Annals of Mathematical Statistics, 27(4):986–1005. 3, 4
1956
-
[44]
Lindley, D. V . (1972).Bayesian statistics: A review. SIAM. 3, 5
1972
-
[45]
Maraval, A., Zimmer, M., Grosnit, A., and Bou Ammar, H. (2023). End-to-end meta-bayesian optimisation with transformer neural processes.Advances in Neural Information Processing Systems, 36:11246–11260. 1, 3, 6, 9, 18, 19
2023
-
[46]
P., and Gardner, J
Maus, N., Kim, K., Pleiss, G., Eriksson, D., Cunningham, J. P., and Gardner, J. R. (2024). Approximation-aware bayesian optimization.Advances in Neural Information Processing Systems, 37:21114–21140. 1, 3
2024
-
[47]
Müller, S., Feurer, M., Hollmann, N., and Hutter, F. (2023). Pfns4bo: In-context learning for bayesian optimization. InInternational Conference on Machine Learning, pages 25444–25470. PMLR. 3, 6, 9, 19
2023
-
[48]
P., Grabocka, J., and Hutter, F
Müller, S., Hollmann, N., Arango, S. P., Grabocka, J., and Hutter, F. (2021). Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510. 3, 7, 9, 19
-
[49]
Müller, S., Reuter, A., Hollmann, N., Rügamer, D., and Hutter, F. (2025). Position: The future of bayesian prediction is prior-fitted. InInternational Conference on Machine Learning, pages 81861–81875. PMLR. 2, 7
2025
-
[50]
and Grover, A
Nguyen, T. and Grover, A. (2022). Transformer neural processes: Uncertainty-aware meta learning via sequence modeling. InInternational Conference on Machine Learning, pages 16569– 16594. PMLR. 3, 7, 9, 19
2022
-
[51]
Qu, J., Holzmüller, D., Varoquaux, G., and Morvan, M. L. (2025). Tabicl: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564. 2, 3, 5, 7, 15
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [52]
-
[53]
R., and Bickford Smith, F
Rainforth, T., Foster, A., Ivanova, D. R., and Bickford Smith, F. (2024). Modern bayesian experimental design.Statistical Science, 39(1):100–114. 1, 3, 4
2024
-
[54]
Rasmussen, C. E. (2003). Gaussian processes in machine learning. InSummer school on machine learning, pages 63–71. Springer. 9
2003
- [55]
-
[56]
G., Drovandi, C
Ryan, E. G., Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2016). A review of modern computational algorithms for bayesian optimal design.International Statistical Review, 84(1):128–
2016
-
[57]
Savage, L. J. (1951). The theory of statistical decision.Journal of the American Statistical association, 46(253):55–67. 3
1951
-
[58]
Savage, L. J. (1971). Elicitation of personal probabilities and expectations.Journal of the American Statistical Association, 66(336):783–801. 4
1971
-
[59]
(2012).Active learning
Settles, B. (2012).Active learning. Morgan & Claypool Publishers. 1 13
2012
-
[60]
Shen, W., Dong, J., and Huan, X. (2025). Variational sequential optimal experimental de- sign using reinforcement learning.Computer Methods in Applied Mechanics and Engineering, 444:118068. 6
2025
-
[61]
and Hu, Y .-H
Sheng, X. and Hu, Y .-H. (2005). Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks.IEEE transactions on signal processing, 53(1):44–53. 7, 16
2005
-
[62]
B., Kirsch, A., Farquhar, S., Gal, Y ., Foster, A., and Rainforth, T
Smith, F. B., Kirsch, A., Farquhar, S., Gal, Y ., Foster, A., and Rainforth, T. (2023). Prediction- oriented bayesian active learning. InInternational conference on artificial intelligence and statistics, pages 7331–7348. PMLR. 3, 4, 5, 21
2023
- [63]
-
[64]
C., Sheridan, R
Svetnik, V ., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., and Feuston, B. P. (2003). Random forest: a classification and regression tool for compound classification and qsar modeling. Journal of chemical information and computer sciences, 43(6):1947–1958. 10, 20
2003
- [65]
-
[66]
and Olson, A
Trott, O. and Olson, A. J. (2010). Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.Journal of computational chemistry, 31(2):455–461. 20
2010
-
[67]
J., Adriaensen, S., Rakotoarison, H., Müller, S., Hvarfner, C., Hutter, F., and Bakshy, E
Viering, T. J., Adriaensen, S., Rakotoarison, H., Müller, S., Hvarfner, C., Hutter, F., and Bakshy, E. (2025). α-pfn: In-context learning entropy search. InFrontiers in Probabilistic Inference: Learning meets Sampling. 6
2025
-
[68]
Yang, K., Swanson, K., Jin, W., Coley, C., Eiden, P., Gao, H., Guzman-Perez, A., Hopper, T., Kelley, B., Mathea, M., et al. (2019). Analyzing learned molecular representations for property prediction.Journal of chemical information and modeling, 59(8):3370–3388. 10, 20
2019
- [69]
-
[70]
Zhang, Q., Tan, Y . S., Tian, Q., and Li, P. (2025a). Tabpfn: One model to rule them all?arXiv preprint arXiv:2505.20003. 7
-
[71]
Zhang, X., Hassan, C., Martinelli, J., Huang, D., and Kaski, S. (2026). In-context multi-objective optimization. InThe Fourteenth International Conference on Learning Representations. 3, 6
2026
-
[72]
log p(hT |θ 0, π) 1 L+1 PL ℓ=0 p(hT |θ ℓ, π) # ,U T (π, L) =E
Zhang, X., Huang, D., Kaski, S., and Martinelli, J. (2025b). Pabbo: Preferential amortized black-box optimization. InThe Thirteenth International Conference on Learning Representations. 6 14 Appendix The appendix is organized as follows: • In Appendix A, we provide additional details of POLAR, including the backbone architec- tures, the policy head, and t...
2000
-
[73]
Drawing a kernel uniformly from {RBF, Matérn-3/2, Matérn-5/2}
-
[74]
Drawing a length-scale ℓ∼LogUniform(0.1,2.0) and an output scale σf ∼ Uniform(0.1,1.0)
-
[75]
Sampling a functionf∼ GP(0, k ℓ,σf ). Before each forward pass, we normalise each function using task-level statistics computed indepen- dently of the observed context: for inputs, we use the known domain bounds, and for outputs, we estimate the normalisation mean and variance from a large set of reference points sampled from that function. The horizon is...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.