Conditioning Gaussian Processes on Almost Anything
Pith reviewed 2026-05-21 02:02 UTC · model grok-4.3
The pith
Gaussian processes can be conditioned on almost any pointwise likelihood by equating them to linear diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation including non-linear physics and natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP infer
What carries the argument
Equivalence between GPs and linear diffusion models that recasts conditioning as an ODE with closed-form Gaussian dynamics and Monte Carlo guidance.
If this is right
- Recovers exact standard GP posteriors in linear-Gaussian cases.
- Handles non-linear physics via pointwise likelihoods.
- Enables conditioning on natural language using LLMs.
- Eliminates need for custom derivations per conditioning type.
- Whitening reduces numerical stiffness and Wasserstein-2 cost.
Where Pith is reading between the lines
- Experts could supply constraints in plain language for GP models.
- Hybrid diffusion-GP models for generative tasks become straightforward.
- Testing on scientific datasets with non-linear constraints would validate stability.
- Similar ideas might apply to other kernel methods.
Load-bearing premise
The Monte Carlo approximation of the guidance term stays accurate for complex non-conjugate likelihoods.
What would settle it
Compare diffusion samples to exact posterior on a known non-linear conditioning task; large discrepancy would disprove the approximation's reliability.
Figures
read the original abstract
Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish an explicit equivalence between Gaussian processes and a class of linear diffusion models. Predictive sampling is recast as an ODE possessing closed-form Gaussian dynamics whose drift contains a likelihood-dependent guidance term; this term is replaced by a simple Monte Carlo average over pointwise likelihood evaluations. The construction recovers exact GP conditioning in the linear-Gaussian case and extends, without bespoke derivations, to arbitrary conditioning statements that admit pointwise likelihood evaluation, including non-linear physics and natural-language statements supplied by large language models. Whitening is introduced to isolate irreducible non-Gaussian dynamics, thereby minimising Wasserstein-2 transport cost and removing numerical stiffness.
Significance. If the claimed equivalence and the numerical stability of the Monte Carlo guidance term can be established, the work would supply a genuinely general-purpose mechanism for conditioning GPs on complex, non-conjugate information. The ability to incorporate statements expressed in natural language or by non-linear simulators without custom derivations would constitute a substantial advance in probabilistic modelling.
major comments (2)
- [Abstract / ODE derivation] Abstract and the section deriving the ODE equivalence: the central claim that the Monte Carlo estimator for the likelihood-dependent guidance term remains accurate and stable for non-conjugate conditioning statements is load-bearing for the extension beyond the linear-Gaussian regime, yet the manuscript provides neither variance bounds nor effective-sample-size diagnostics for this estimator under the ODE flow.
- [Whitening section] Section presenting the whitening transformation: the assertion that whitening isolates the irreducible non-Gaussian dynamics and thereby eliminates numerical stiffness must be accompanied by a quantitative comparison of the resulting Wasserstein-2 cost and stiffness metrics against the unwhitened formulation; without such evidence the claimed numerical advantage remains unverified.
minor comments (2)
- [Notation / Monte Carlo estimator] Notation for the guidance term and the Monte Carlo estimator should be introduced with explicit dependence on the likelihood function and the number of samples; the current presentation leaves the scaling with conditioning complexity implicit.
- [Assumptions paragraph] The manuscript should include a brief statement of the precise regularity conditions on the likelihood that guarantee the existence of the closed-form Gaussian dynamics.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and for recognizing the potential of the proposed framework. We respond point-by-point to the major comments below, indicating the revisions we will make to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract / ODE derivation] Abstract and the section deriving the ODE equivalence: the central claim that the Monte Carlo estimator for the likelihood-dependent guidance term remains accurate and stable for non-conjugate conditioning statements is load-bearing for the extension beyond the linear-Gaussian regime, yet the manuscript provides neither variance bounds nor effective-sample-size diagnostics for this estimator under the ODE flow.
Authors: We agree that theoretical or diagnostic support for the Monte Carlo guidance estimator is important to substantiate the non-conjugate claims. In the linear-Gaussian case the estimator is exact by construction, but for general likelihoods we currently rely on empirical performance. In the revised manuscript we will add a short derivation of variance bounds for the estimator under the linear ODE flow and report effective sample size diagnostics from the numerical experiments in the non-linear physics and language-conditioning sections. These additions will appear in the ODE derivation section. revision: yes
-
Referee: [Whitening section] Section presenting the whitening transformation: the assertion that whitening isolates the irreducible non-Gaussian dynamics and thereby eliminates numerical stiffness must be accompanied by a quantitative comparison of the resulting Wasserstein-2 cost and stiffness metrics against the unwhitened formulation; without such evidence the claimed numerical advantage remains unverified.
Authors: We accept that the current text presents the theoretical motivation for whitening without direct quantitative verification. We will revise the whitening section to include explicit comparisons of Wasserstein-2 transport costs and stiffness indicators (such as maximum integration step size or condition number of the drift) between the whitened and unwhitened formulations, drawing on the simulation results already obtained in the paper. revision: yes
Circularity Check
No circularity: GP-diffusion equivalence and ODE recasting are derived rather than tautological
full rationale
The paper derives an explicit equivalence between Gaussian processes and linear diffusion models, recasting predictive sampling as an ODE whose closed-form Gaussian dynamics are supplemented by a likelihood-dependent guidance term. In the linear-Gaussian regime this recovers standard GP conditioning exactly, while non-conjugate cases are handled by replacing the guidance term with a Monte Carlo average over pointwise likelihood evaluations. No load-bearing step reduces the claimed equivalence, the ODE formulation, or the extension to arbitrary conditioning statements to a fitted quantity defined by the target result itself, a self-referential definition, or a self-citation chain whose validity is presupposed. The whitening step and Wasserstein-2 minimisation are presented as consequences of the equivalence rather than inputs smuggled in by ansatz or prior self-work. The derivation therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An explicit equivalence exists between Gaussian processes and a class of linear diffusion models that preserves closed-form Gaussian dynamics.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
MIT Press, Cambridge, MA, USA, 2006
Christopher KI Williams and Carl Edward Rasmussen.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA, 2006
work page 2006
-
[2]
Håvard Rue and Sara Martino. Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007
work page 2007
-
[3]
Botond Cseke and Tom Heskes. Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011
work page 2011
-
[4]
Gaussian Processes for Big Data
James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data.arXiv preprint arXiv:1309.6835, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[5]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020
work page 2020
-
[6]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021
work page 2021
-
[7]
Flow matching for generative modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[8]
Building normalizing flows with stochastic inter- polants
Michael Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic inter- polants. InInternational Conference on Learning Representations, 2023
work page 2023
-
[9]
Flow straight and fast: learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[10]
The Principles of Diffusion Models
Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models.arXiv preprint arXiv:2510.21890, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[12]
Marvin Pförtner, Ingo Steinwart, Philipp Hennig, and Jonathan Wenger. Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022
-
[13]
Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006
Thomas Mejer Hansen, Andre G Journel, Albert Tarantola, and Klaus Mosegaard. Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006
work page 2006
-
[14]
Ercan Solak, Roderick Murray-Smith, WE Leithead, Douglas Leith, and Carl Rasmussen. Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002
work page 2002
-
[15]
Inferring flow energy, space scales, and timescales: freely drifting vs
Aurelien Luigi Serge Ponte, Lachlan C Astfalck, Matthew D Rayson, Andrew P Zulberti, and Nicole L Jones. Inferring flow energy, space scales, and timescales: freely drifting vs. fixed-point observations.Nonlinear Processes in Geophysics, 31(4):571–586, 2024
work page 2024
-
[16]
Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991
Anthony O’Hagan. Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991
work page 1991
-
[17]
Probabilistic integration.Statistical Science, 34(1):1–22, 2019
François-Xavier Briol, Chris J Oates, Mark Girolami, Michael A Osborne, and Dino Sejdinovic. Probabilistic integration.Statistical Science, 34(1):1–22, 2019
work page 2019
-
[18]
Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017
Carl Jidling, Niklas Wahlström, Adrian Wills, and Thomas B Schön. Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017
work page 2017
-
[19]
On stationary processes in the plane.Biometrika, pages 434–449, 1954
Peter Whittle. On stationary processes in the plane.Biometrika, pages 434–449, 1954. 10
work page 1954
-
[20]
Finn Lindgren, Håvard Rue, and Johan Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(4):423–498, 2011
work page 2011
-
[21]
Arno Solin and Simo Särkkä. Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020
work page 2020
-
[22]
Efficiently sampling functions from gaussian process posteriors
James Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisen- roth. Efficiently sampling functions from gaussian process posteriors. InInternational confer- ence on machine learning, pages 10292–10302. PMLR, 2020
work page 2020
-
[23]
Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005
work page 1939
-
[24]
Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010
Miguel Lázaro-Gredilla, Joaquin Quinonero-Candela, Carl Edward Rasmussen, and Aníbal R Figueiras-Vidal. Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010
work page 2010
-
[25]
Kernel interpolation for scalable structured Gaussian processes (KISS-GP)
Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). InInternational Conference on Machine Learning, pages 1775–1784. PMLR, 2015
work page 2015
-
[26]
Preconditioning kernel matrices
Kurt Cutajar, Michael Osborne, John Cunningham, and Maurizio Filippone. Preconditioning kernel matrices. InInternational Conference on Machine Learning, pages 2529–2538. PMLR, 2016
work page 2016
-
[27]
Computation-aware Gaussian processes: model selection and linear-time inference
Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R Gardner, Geoff Pleiss, and John P Cun- ningham. Computation-aware Gaussian processes: model selection and linear-time inference. Advances in Neural Information Processing Systems, 37:31316–31349, 2024
work page 2024
-
[28]
Mauricio A Alvarez, David Luengo, and Neil D Lawrence. Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013
work page 2013
-
[29]
Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017
work page 2017
-
[30]
Andreas Besginow and Markus Lange-Hegermann. Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022
work page 2022
-
[31]
Oliver Hamelijnck, Arno Solin, and Theodoros Damoulas. Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024
work page 2024
-
[32]
AutoIP: A united framework to integrate physics into Gaussian processes
Da Long, Zheng Wang, Aditi Krishnapriyan, Robert Kirby, Shandian Zhe, and Michael Mahoney. AutoIP: A united framework to integrate physics into Gaussian processes. InInternational Conference on Machine Learning, pages 14210–14222. PMLR, 2022
work page 2022
-
[33]
Yifan Chen, Bamdad Hosseini, Houman Owhadi, and Andrew M Stuart. Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021
work page 2021
-
[34]
Michael Schober, Simo Särkkä, and Philipp Hennig. A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019
work page 2019
-
[35]
Filip Tronarp, Hans Kersting, Simo Särkkä, and Philipp Hennig. Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019
work page 2019
-
[36]
Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982. 11
work page 1982
-
[37]
A Survey on Diffusion Models for Inverse Problems
Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems.arXiv preprint arXiv:2410.00083, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Diffusion posterior sampling for general noisy inverse problems
Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations, ICLR 2023. The International Conference on Learning Representations, 2023
work page 2023
-
[39]
Manifold preserv- ing guided diffusion
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, et al. Manifold preserv- ing guided diffusion. InThe Twelfth International Conference on Learning Representations, ICLR 2024. The International Conference on Learning Representations, 2024
work page 2024
-
[40]
Free hunch: Denoiser covariance estimation for diffusion models without extra costs
Severi Rissanen, Markus Heinonen, and Arno Solin. Free hunch: Denoiser covariance estimation for diffusion models without extra costs. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[41]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR, 2021
work page 2021
-
[42]
Christian Agrell. Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019
work page 2019
-
[43]
Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014
Lizhen Lin and David B Dunson. Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014
work page 2014
-
[44]
Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025
Lachlan Astfalck, Deborshee Sen, Sayan Patra, Edward Cripps, and David Dunson. Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025
work page 2025
-
[45]
Fangpo Wang and Alan E Gelfand. Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014
work page 2014
-
[46]
Gaussian processes with monotonicity information
Jaakko Riihimäki and Aki Vehtari. Gaussian processes with monotonicity information. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 645–652. JMLR Workshop and Conference Proceedings, 2010
work page 2010
-
[47]
Gaussian process modeling with inequality con- straints
Sébastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality con- straints. InAnnales de la Faculté des Sciences de Toulouse: Mathématiques, pages 529–555, 2012
work page 2012
-
[48]
Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[49]
LLM Flow Processes for Text-Conditioned Regression
Felix Biggs and Samuel Willis. LLM flow processes for text-conditioned regression.arXiv preprint arXiv:2601.06147, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[50]
Springer, Cham, Switzerland, 2025
Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet.Statistical Optimal Transport. Springer, Cham, Switzerland, 2025
work page 2025
-
[51]
Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000
work page 2000
-
[52]
Springer, Berlin, Germany, 1993
Ernst Hairer, Gerhard Wanner, and Syvert P Nørsett.Solving Ordinary Differential Equations I: Nonstiff Problems. Springer, Berlin, Germany, 1993
work page 1993
-
[53]
On logarithmic concave measures and functions.Acta Sci
András Prékopa. On logarithmic concave measures and functions.Acta Sci. Math., 34:335, 1973
work page 1973
-
[54]
Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014
Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014
work page 2014
-
[55]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 12
work page 2022
-
[56]
Cambridge University Press, Cambridge, UK, 2019
Simo Särkkä and Arno Solin.Applied Stochastic Differential Equations, volume 10. Cambridge University Press, Cambridge, UK, 2019
work page 2019
-
[57]
Hanyang Wang, Juergen Branke, and Matthias Poloczek. Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025
work page 2025
-
[58]
Preference learning with Gaussian processes
Wei Chu and Zoubin Ghahramani. Preference learning with Gaussian processes. InInternational Conference on Machine Learning, 2005
work page 2005
-
[59]
Oxford University Press, Oxford, UK, 1995
Andreu Mas-Colell, Michael Dennis Whinston, Jerry R Green, et al.Microeconomic Theory. Oxford University Press, Oxford, UK, 1995
work page 1995
-
[60]
Preference exploration for efficient Bayesian optimization with multiple outcomes
Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration for efficient Bayesian optimization with multiple outcomes. InInternational Conference on Artificial Intelligence and Statistics, 2022
work page 2022
-
[61]
BoTorch: A framework for efficient Monte-Carlo Bayesian optimization
Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. Advances in Neural Information Processing Systems, 2020
work page 2020
-
[62]
Carl Hvarfner, Erik O Hellsten, and Luigi Nardi. Vanilla Bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning, 2024. 13 A Derivation of our flow’s marginal and joint distributions Under the prior f0 ∼ N(m ∗,K ∗∗) and the corruption model (8), the pair (f0,f t) is jointly Gaussian. Writing ft =α(t)f 0 + p ...
work page 2024
-
[63]
the reversed-time exact drift a satisfies a one-sided Lipschitz condition in f, uniformly in r: therefore there existsη τ ∈Rsuch that ⟨x−y,a(r,x)−a(r,y)⟩ ≤η τ ∥x−y∥ 2 ∀x,y∈R m, r∈[0,1−τ];(47)
-
[64]
the realised guidance approximation error is uniformly bounded on the relevant state-space region visited by the exact and approximate trajectories: ετ := sup (t,f)∈R τ ∥g(t,f)−bg(t,f)∥<∞, where Rτ ⊆[τ,1]×R m denotes any region containing both trajectories on the truncated interval. The constant ητ is allowed to be negative; this contractive case will be ...
-
[65]
Build S samples f(i) 0 ∼p(f 0 |f t,D) via f(i) 0 =µ 0|t +Σ 1/2 0|t ϵ(i), with µ0|t from (22) and Σ0|t from (23) withm ∗|y,K ∗∗|y andA |y(t)in place ofm ∗,K ∗∗ andA(t)
-
[66]
Evaluate each log-likelihood logp(C |f (i) 0 ) and its gradient ∇f0 logp(C |f (i) 0 ) at each sample,
-
[67]
Compute normalised weights via the numerically stable log-sum-exp operation log ¯w(i) = logp(C |f (i) 0 )−logsumexp r logp(C |f (r) 0 ). Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value. 26 Algorithm 1FLOWGP: sampling from a GP predictive distribution...
-
[68]
Evaluate the weighted sum (30), applying the Jacobian to each gradient term,
-
[69]
We clip the norm of the vector field (after scaling by− 1 2 β(t) as prescribed by the probability flow ODE (16)) to limit excessively large steps to ensure stable integration. We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold. This transformation bounds excessively large gradients whilst preserving Li...
-
[70]
using a scaled RBF kernel cov(f0(t), f0(t′)) =τ 2 exp −(t−t ′)2 2κ2 ,(70) with hyperparameters τ 2 and κ optimised together with an affine mean function by maximising the marginal likelihood using just D,. As in the previous experiment, the GP predictive mean m∗|y and covariance K∗∗|y on the evaluation grid are used to construct the base Gaussian predicti...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.