pith. sign in

arxiv: 2605.21041 · v1 · pith:AUFDNMCNnew · submitted 2026-05-20 · 📊 stat.ML · cs.LG· stat.ME

Conditioning Gaussian Processes on Almost Anything

Pith reviewed 2026-05-21 02:02 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords Gaussian processesconditioningdiffusion modelsMonte CarloODElikelihoodprobabilistic inference
0
0 comments X

The pith

Gaussian processes can be conditioned on almost any pointwise likelihood by equating them to linear diffusion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an explicit equivalence between Gaussian processes and linear diffusion models. Predictive sampling is recast as an ODE with closed-form Gaussian dynamics plus a Monte Carlo approximable guidance term from the likelihood. This recovers exact GP conditioning in linear-Gaussian cases and extends to non-conjugate statements such as non-linear physics and natural language via LLMs. Whitening isolates non-Gaussian dynamics to reduce transport cost and stiffness. The result is a general-purpose inference method without bespoke derivations for each new conditioning type.

Core claim

We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation including non-linear physics and natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP infer

What carries the argument

Equivalence between GPs and linear diffusion models that recasts conditioning as an ODE with closed-form Gaussian dynamics and Monte Carlo guidance.

If this is right

  • Recovers exact standard GP posteriors in linear-Gaussian cases.
  • Handles non-linear physics via pointwise likelihoods.
  • Enables conditioning on natural language using LLMs.
  • Eliminates need for custom derivations per conditioning type.
  • Whitening reduces numerical stiffness and Wasserstein-2 cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Experts could supply constraints in plain language for GP models.
  • Hybrid diffusion-GP models for generative tasks become straightforward.
  • Testing on scientific datasets with non-linear constraints would validate stability.
  • Similar ideas might apply to other kernel methods.

Load-bearing premise

The Monte Carlo approximation of the guidance term stays accurate for complex non-conjugate likelihoods.

What would settle it

Compare diffusion samples to exact posterior on a known non-linear conditioning task; large discrepancy would disprove the approximation's reliability.

Figures

Figures reproduced from arXiv: 2605.21041 by Andrew Zammit-Mangion, Christopher Nemeth, Colin Doumont, Henry Moss, Lachlan Astfalck, Philipp Hennig, Sam Willis, Thomas Cowperthwaite.

Figure 1
Figure 1. Figure 1: (left of each pair) Samples from a GP conditioned on observations (red dots) and (right of each pair) samples from FLOWGP including additional information about non-linear physics via known differential equations (a-c) and natural language descriptions via an LLM-based likelihood (d-f). In each case, the unconstrained GP produces statistically coherent but semantically uninformed samples, whilst FLOWGP pro… view at source ↗
Figure 2
Figure 2. Figure 2: Generating samples from the GP predictive distribution when conditioning on Gaussian [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Extension of Figure [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Predictive samples from an unconstrained GP [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: On all six Bayesian Optimisation with Preference Exploration (BOPE) problems consid [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional experiments on the monotonic and bounded regression problem, showing the [PITH_FULL_IMAGE:figures/full_fig_p036_6.png] view at source ↗
read the original abstract

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to establish an explicit equivalence between Gaussian processes and a class of linear diffusion models. Predictive sampling is recast as an ODE possessing closed-form Gaussian dynamics whose drift contains a likelihood-dependent guidance term; this term is replaced by a simple Monte Carlo average over pointwise likelihood evaluations. The construction recovers exact GP conditioning in the linear-Gaussian case and extends, without bespoke derivations, to arbitrary conditioning statements that admit pointwise likelihood evaluation, including non-linear physics and natural-language statements supplied by large language models. Whitening is introduced to isolate irreducible non-Gaussian dynamics, thereby minimising Wasserstein-2 transport cost and removing numerical stiffness.

Significance. If the claimed equivalence and the numerical stability of the Monte Carlo guidance term can be established, the work would supply a genuinely general-purpose mechanism for conditioning GPs on complex, non-conjugate information. The ability to incorporate statements expressed in natural language or by non-linear simulators without custom derivations would constitute a substantial advance in probabilistic modelling.

major comments (2)
  1. [Abstract / ODE derivation] Abstract and the section deriving the ODE equivalence: the central claim that the Monte Carlo estimator for the likelihood-dependent guidance term remains accurate and stable for non-conjugate conditioning statements is load-bearing for the extension beyond the linear-Gaussian regime, yet the manuscript provides neither variance bounds nor effective-sample-size diagnostics for this estimator under the ODE flow.
  2. [Whitening section] Section presenting the whitening transformation: the assertion that whitening isolates the irreducible non-Gaussian dynamics and thereby eliminates numerical stiffness must be accompanied by a quantitative comparison of the resulting Wasserstein-2 cost and stiffness metrics against the unwhitened formulation; without such evidence the claimed numerical advantage remains unverified.
minor comments (2)
  1. [Notation / Monte Carlo estimator] Notation for the guidance term and the Monte Carlo estimator should be introduced with explicit dependence on the likelihood function and the number of samples; the current presentation leaves the scaling with conditioning complexity implicit.
  2. [Assumptions paragraph] The manuscript should include a brief statement of the precise regularity conditions on the likelihood that guarantee the existence of the closed-form Gaussian dynamics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the potential of the proposed framework. We respond point-by-point to the major comments below, indicating the revisions we will make to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract / ODE derivation] Abstract and the section deriving the ODE equivalence: the central claim that the Monte Carlo estimator for the likelihood-dependent guidance term remains accurate and stable for non-conjugate conditioning statements is load-bearing for the extension beyond the linear-Gaussian regime, yet the manuscript provides neither variance bounds nor effective-sample-size diagnostics for this estimator under the ODE flow.

    Authors: We agree that theoretical or diagnostic support for the Monte Carlo guidance estimator is important to substantiate the non-conjugate claims. In the linear-Gaussian case the estimator is exact by construction, but for general likelihoods we currently rely on empirical performance. In the revised manuscript we will add a short derivation of variance bounds for the estimator under the linear ODE flow and report effective sample size diagnostics from the numerical experiments in the non-linear physics and language-conditioning sections. These additions will appear in the ODE derivation section. revision: yes

  2. Referee: [Whitening section] Section presenting the whitening transformation: the assertion that whitening isolates the irreducible non-Gaussian dynamics and thereby eliminates numerical stiffness must be accompanied by a quantitative comparison of the resulting Wasserstein-2 cost and stiffness metrics against the unwhitened formulation; without such evidence the claimed numerical advantage remains unverified.

    Authors: We accept that the current text presents the theoretical motivation for whitening without direct quantitative verification. We will revise the whitening section to include explicit comparisons of Wasserstein-2 transport costs and stiffness indicators (such as maximum integration step size or condition number of the drift) between the whitened and unwhitened formulations, drawing on the simulation results already obtained in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: GP-diffusion equivalence and ODE recasting are derived rather than tautological

full rationale

The paper derives an explicit equivalence between Gaussian processes and linear diffusion models, recasting predictive sampling as an ODE whose closed-form Gaussian dynamics are supplemented by a likelihood-dependent guidance term. In the linear-Gaussian regime this recovers standard GP conditioning exactly, while non-conjugate cases are handled by replacing the guidance term with a Monte Carlo average over pointwise likelihood evaluations. No load-bearing step reduces the claimed equivalence, the ODE formulation, or the extension to arbitrary conditioning statements to a fitted quantity defined by the target result itself, a self-referential definition, or a self-citation chain whose validity is presupposed. The whitening step and Wasserstein-2 minimisation are presented as consequences of the equivalence rather than inputs smuggled in by ansatz or prior self-work. The derivation therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unproven equivalence between GPs and linear diffusion models plus the adequacy of Monte Carlo for the guidance term; no free parameters or new entities are declared in the abstract.

axioms (1)
  • domain assumption An explicit equivalence exists between Gaussian processes and a class of linear diffusion models that preserves closed-form Gaussian dynamics.
    Invoked to recast predictive sampling as an ODE.

pith-pipeline@v0.9.0 · 5715 in / 1148 out tokens · 36761 ms · 2026-05-21T02:02:31.709567+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 6 internal anchors

  1. [1]

    MIT Press, Cambridge, MA, USA, 2006

    Christopher KI Williams and Carl Edward Rasmussen.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA, 2006

  2. [2]

    Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007

    Håvard Rue and Sara Martino. Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007

  3. [3]

    Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011

    Botond Cseke and Tom Heskes. Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011

  4. [4]

    Gaussian Processes for Big Data

    James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data.arXiv preprint arXiv:1309.6835, 2013

  5. [5]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  6. [6]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  7. [7]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

  8. [8]

    Building normalizing flows with stochastic inter- polants

    Michael Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic inter- polants. InInternational Conference on Learning Representations, 2023

  9. [9]

    Flow straight and fast: learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

  10. [10]

    The Principles of Diffusion Models

    Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models.arXiv preprint arXiv:2510.21890, 2025

  11. [11]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  12. [12]

    Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022

    Marvin Pförtner, Ingo Steinwart, Philipp Hennig, and Jonathan Wenger. Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022

  13. [13]

    Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006

    Thomas Mejer Hansen, Andre G Journel, Albert Tarantola, and Klaus Mosegaard. Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006

  14. [14]

    Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002

    Ercan Solak, Roderick Murray-Smith, WE Leithead, Douglas Leith, and Carl Rasmussen. Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002

  15. [15]

    Inferring flow energy, space scales, and timescales: freely drifting vs

    Aurelien Luigi Serge Ponte, Lachlan C Astfalck, Matthew D Rayson, Andrew P Zulberti, and Nicole L Jones. Inferring flow energy, space scales, and timescales: freely drifting vs. fixed-point observations.Nonlinear Processes in Geophysics, 31(4):571–586, 2024

  16. [16]

    Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991

    Anthony O’Hagan. Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991

  17. [17]

    Probabilistic integration.Statistical Science, 34(1):1–22, 2019

    François-Xavier Briol, Chris J Oates, Mark Girolami, Michael A Osborne, and Dino Sejdinovic. Probabilistic integration.Statistical Science, 34(1):1–22, 2019

  18. [18]

    Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017

    Carl Jidling, Niklas Wahlström, Adrian Wills, and Thomas B Schön. Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017

  19. [19]

    On stationary processes in the plane.Biometrika, pages 434–449, 1954

    Peter Whittle. On stationary processes in the plane.Biometrika, pages 434–449, 1954. 10

  20. [20]

    Finn Lindgren, Håvard Rue, and Johan Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(4):423–498, 2011

  21. [21]

    Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020

    Arno Solin and Simo Särkkä. Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020

  22. [22]

    Efficiently sampling functions from gaussian process posteriors

    James Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisen- roth. Efficiently sampling functions from gaussian process posteriors. InInternational confer- ence on machine learning, pages 10292–10302. PMLR, 2020

  23. [23]

    A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005

    Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005

  24. [24]

    Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010

    Miguel Lázaro-Gredilla, Joaquin Quinonero-Candela, Carl Edward Rasmussen, and Aníbal R Figueiras-Vidal. Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010

  25. [25]

    Kernel interpolation for scalable structured Gaussian processes (KISS-GP)

    Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). InInternational Conference on Machine Learning, pages 1775–1784. PMLR, 2015

  26. [26]

    Preconditioning kernel matrices

    Kurt Cutajar, Michael Osborne, John Cunningham, and Maurizio Filippone. Preconditioning kernel matrices. InInternational Conference on Machine Learning, pages 2529–2538. PMLR, 2016

  27. [27]

    Computation-aware Gaussian processes: model selection and linear-time inference

    Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R Gardner, Geoff Pleiss, and John P Cun- ningham. Computation-aware Gaussian processes: model selection and linear-time inference. Advances in Neural Information Processing Systems, 37:31316–31349, 2024

  28. [28]

    Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013

    Mauricio A Alvarez, David Luengo, and Neil D Lawrence. Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013

  29. [29]

    Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017

    Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017

  30. [30]

    Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022

    Andreas Besginow and Markus Lange-Hegermann. Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022

  31. [31]

    Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024

    Oliver Hamelijnck, Arno Solin, and Theodoros Damoulas. Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024

  32. [32]

    AutoIP: A united framework to integrate physics into Gaussian processes

    Da Long, Zheng Wang, Aditi Krishnapriyan, Robert Kirby, Shandian Zhe, and Michael Mahoney. AutoIP: A united framework to integrate physics into Gaussian processes. InInternational Conference on Machine Learning, pages 14210–14222. PMLR, 2022

  33. [33]

    Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021

    Yifan Chen, Bamdad Hosseini, Houman Owhadi, and Andrew M Stuart. Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021

  34. [34]

    A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019

    Michael Schober, Simo Särkkä, and Philipp Hennig. A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019

  35. [35]

    Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019

    Filip Tronarp, Hans Kersting, Simo Särkkä, and Philipp Hennig. Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019

  36. [36]

    Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

    Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982. 11

  37. [37]

    A Survey on Diffusion Models for Inverse Problems

    Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems.arXiv preprint arXiv:2410.00083, 2024

  38. [38]

    Diffusion posterior sampling for general noisy inverse problems

    Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations, ICLR 2023. The International Conference on Learning Representations, 2023

  39. [39]

    Manifold preserv- ing guided diffusion

    Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, et al. Manifold preserv- ing guided diffusion. InThe Twelfth International Conference on Learning Representations, ICLR 2024. The International Conference on Learning Representations, 2024

  40. [40]

    Free hunch: Denoiser covariance estimation for diffusion models without extra costs

    Severi Rissanen, Markus Heinonen, and Arno Solin. Free hunch: Denoiser covariance estimation for diffusion models without extra costs. InThe Thirteenth International Conference on Learning Representations, 2025

  41. [41]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR, 2021

  42. [42]

    Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019

    Christian Agrell. Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019

  43. [43]

    Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014

    Lizhen Lin and David B Dunson. Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014

  44. [44]

    Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025

    Lachlan Astfalck, Deborshee Sen, Sayan Patra, Edward Cripps, and David Dunson. Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025

  45. [45]

    Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014

    Fangpo Wang and Alan E Gelfand. Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014

  46. [46]

    Gaussian processes with monotonicity information

    Jaakko Riihimäki and Aki Vehtari. Gaussian processes with monotonicity information. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 645–652. JMLR Workshop and Conference Proceedings, 2010

  47. [47]

    Gaussian process modeling with inequality con- straints

    Sébastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality con- straints. InAnnales de la Faculté des Sciences de Toulouse: Mathématiques, pages 529–555, 2012

  48. [48]

    Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026

  49. [49]

    LLM Flow Processes for Text-Conditioned Regression

    Felix Biggs and Samuel Willis. LLM flow processes for text-conditioned regression.arXiv preprint arXiv:2601.06147, 2026

  50. [50]

    Springer, Cham, Switzerland, 2025

    Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet.Statistical Optimal Transport. Springer, Cham, Switzerland, 2025

  51. [51]

    A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

    Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

  52. [52]

    Springer, Berlin, Germany, 1993

    Ernst Hairer, Gerhard Wanner, and Syvert P Nørsett.Solving Ordinary Differential Equations I: Nonstiff Problems. Springer, Berlin, Germany, 1993

  53. [53]

    On logarithmic concave measures and functions.Acta Sci

    András Prékopa. On logarithmic concave measures and functions.Acta Sci. Math., 34:335, 1973

  54. [54]

    Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014

    Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014

  55. [55]

    DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 12

  56. [56]

    Cambridge University Press, Cambridge, UK, 2019

    Simo Särkkä and Arno Solin.Applied Stochastic Differential Equations, volume 10. Cambridge University Press, Cambridge, UK, 2019

  57. [57]

    Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025

    Hanyang Wang, Juergen Branke, and Matthias Poloczek. Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025

  58. [58]

    Preference learning with Gaussian processes

    Wei Chu and Zoubin Ghahramani. Preference learning with Gaussian processes. InInternational Conference on Machine Learning, 2005

  59. [59]

    Oxford University Press, Oxford, UK, 1995

    Andreu Mas-Colell, Michael Dennis Whinston, Jerry R Green, et al.Microeconomic Theory. Oxford University Press, Oxford, UK, 1995

  60. [60]

    Preference exploration for efficient Bayesian optimization with multiple outcomes

    Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration for efficient Bayesian optimization with multiple outcomes. InInternational Conference on Artificial Intelligence and Statistics, 2022

  61. [61]

    BoTorch: A framework for efficient Monte-Carlo Bayesian optimization

    Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. Advances in Neural Information Processing Systems, 2020

  62. [62]

    ∂f (i) 0 ∂ft #T ∇f0 p(C |f (i) 0 ).(27) Rewriting using the log-derivative trick, we obtain ∇ft p(C |f (i) 0 ) =p(C |f (i) 0 )

    Carl Hvarfner, Erik O Hellsten, and Luigi Nardi. Vanilla Bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning, 2024. 13 A Derivation of our flow’s marginal and joint distributions Under the prior f0 ∼ N(m ∗,K ∗∗) and the corruption model (8), the pair (f0,f t) is jointly Gaussian. Writing ft =α(t)f 0 + p ...

  63. [63]

    the reversed-time exact drift a satisfies a one-sided Lipschitz condition in f, uniformly in r: therefore there existsη τ ∈Rsuch that ⟨x−y,a(r,x)−a(r,y)⟩ ≤η τ ∥x−y∥ 2 ∀x,y∈R m, r∈[0,1−τ];(47)

  64. [64]

    The constant ητ is allowed to be negative; this contractive case will be exploited in Corollary E.7 below

    the realised guidance approximation error is uniformly bounded on the relevant state-space region visited by the exact and approximate trajectories: ετ := sup (t,f)∈R τ ∥g(t,f)−bg(t,f)∥<∞, where Rτ ⊆[τ,1]×R m denotes any region containing both trajectories on the truncated interval. The constant ητ is allowed to be negative; this contractive case will be ...

  65. [65]

    Build S samples f(i) 0 ∼p(f 0 |f t,D) via f(i) 0 =µ 0|t +Σ 1/2 0|t ϵ(i), with µ0|t from (22) and Σ0|t from (23) withm ∗|y,K ∗∗|y andA |y(t)in place ofm ∗,K ∗∗ andA(t)

  66. [66]

    Evaluate each log-likelihood logp(C |f (i) 0 ) and its gradient ∇f0 logp(C |f (i) 0 ) at each sample,

  67. [67]

    Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value

    Compute normalised weights via the numerically stable log-sum-exp operation log ¯w(i) = logp(C |f (i) 0 )−logsumexp r logp(C |f (r) 0 ). Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value. 26 Algorithm 1FLOWGP: sampling from a GP predictive distribution...

  68. [68]

    Evaluate the weighted sum (30), applying the Jacobian to each gradient term,

  69. [69]

    We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold

    We clip the norm of the vector field (after scaling by− 1 2 β(t) as prescribed by the probability flow ODE (16)) to limit excessively large steps to ensure stable integration. We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold. This transformation bounds excessively large gradients whilst preserving Li...

  70. [70]

    Vanilla BO

    using a scaled RBF kernel cov(f0(t), f0(t′)) =τ 2 exp −(t−t ′)2 2κ2 ,(70) with hyperparameters τ 2 and κ optimised together with an affine mean function by maximising the marginal likelihood using just D,. As in the previous experiment, the GP predictive mean m∗|y and covariance K∗∗|y on the evaluation grid are used to construct the base Gaussian predicti...