Conditioning Gaussian Processes on Almost Anything

Andrew Zammit-Mangion; Christopher Nemeth; Colin Doumont; Henry Moss; Lachlan Astfalck; Philipp Hennig; Sam Willis; Thomas Cowperthwaite

arxiv: 2605.21041 · v1 · pith:AUFDNMCNnew · submitted 2026-05-20 · 📊 stat.ML · cs.LG· stat.ME

Conditioning Gaussian Processes on Almost Anything

Henry Moss , Lachlan Astfalck , Thomas Cowperthwaite , Colin Doumont , Sam Willis , Philipp Hennig , Christopher Nemeth , Andrew Zammit-Mangion This is my paper

Pith reviewed 2026-05-21 02:02 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords Gaussian processesconditioningdiffusion modelsMonte CarloODElikelihoodprobabilistic inference

0 comments

The pith

Gaussian processes can be conditioned on almost any pointwise likelihood by equating them to linear diffusion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an explicit equivalence between Gaussian processes and linear diffusion models. Predictive sampling is recast as an ODE with closed-form Gaussian dynamics plus a Monte Carlo approximable guidance term from the likelihood. This recovers exact GP conditioning in linear-Gaussian cases and extends to non-conjugate statements such as non-linear physics and natural language via LLMs. Whitening isolates non-Gaussian dynamics to reduce transport cost and stiffness. The result is a general-purpose inference method without bespoke derivations for each new conditioning type.

Core claim

We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation including non-linear physics and natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP infer

What carries the argument

Equivalence between GPs and linear diffusion models that recasts conditioning as an ODE with closed-form Gaussian dynamics and Monte Carlo guidance.

If this is right

Recovers exact standard GP posteriors in linear-Gaussian cases.
Handles non-linear physics via pointwise likelihoods.
Enables conditioning on natural language using LLMs.
Eliminates need for custom derivations per conditioning type.
Whitening reduces numerical stiffness and Wasserstein-2 cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Experts could supply constraints in plain language for GP models.
Hybrid diffusion-GP models for generative tasks become straightforward.
Testing on scientific datasets with non-linear constraints would validate stability.
Similar ideas might apply to other kernel methods.

Load-bearing premise

The Monte Carlo approximation of the guidance term stays accurate for complex non-conjugate likelihoods.

What would settle it

Compare diffusion samples to exact posterior on a known non-linear conditioning task; large discrepancy would disprove the approximation's reliability.

Figures

Figures reproduced from arXiv: 2605.21041 by Andrew Zammit-Mangion, Christopher Nemeth, Colin Doumont, Henry Moss, Lachlan Astfalck, Philipp Hennig, Sam Willis, Thomas Cowperthwaite.

**Figure 1.** Figure 1: (left of each pair) Samples from a GP conditioned on observations (red dots) and (right of each pair) samples from FLOWGP including additional information about non-linear physics via known differential equations (a-c) and natural language descriptions via an LLM-based likelihood (d-f). In each case, the unconstrained GP produces statistically coherent but semantically uninformed samples, whilst FLOWGP pro… view at source ↗

**Figure 2.** Figure 2: Generating samples from the GP predictive distribution when conditioning on Gaussian [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Extension of Figure [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Predictive samples from an unconstrained GP [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗

**Figure 5.** Figure 5: On all six Bayesian Optimisation with Preference Exploration (BOPE) problems consid [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗

**Figure 6.** Figure 6: Additional experiments on the monotonic and bounded regression problem, showing the [PITH_FULL_IMAGE:figures/full_fig_p036_6.png] view at source ↗

read the original abstract

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper equates GPs to linear diffusions so conditioning becomes an ODE with MC guidance, recovering exact results in conjugate cases and extending to LLMs and non-linear likelihoods, but the approximation's stability for hard cases is the main open question.

read the letter

The main takeaway is that the authors have found a way to condition Gaussian processes on a much wider range of information by showing an equivalence to linear diffusion models. This turns the prediction step into an ODE that has closed-form Gaussian dynamics and adds a guidance term based on the likelihood, which they approximate with Monte Carlo samples. In the usual linear-Gaussian case this recovers the exact GP posterior. The real extension is to any conditioning that allows pointwise likelihood evaluation, such as outputs from non-linear simulators or even statements in natural language scored by large language models. The whitening they introduce helps keep the numerical integration stable by isolating the non-Gaussian parts. This approach is new because it provides a single framework for these different types of conditioning without requiring problem-specific derivations each time. It also claims to be the first to bring language model conditioning into the GP setting this way. The practical benefit is that it opens GPs to incorporating more kinds of real-world knowledge directly. The potential issue is with the Monte Carlo estimator for the guidance term when moving away from conjugate settings. For complex likelihoods, the variance of that estimate could grow, which might affect the accuracy of the overall sampling or make the ODE harder to solve. The abstract presents it as straightforward, but without explicit error analysis or tests on how it holds up, it's difficult to gauge the reliability for the most ambitious uses. This paper would interest researchers who build probabilistic models and want to move GPs past their traditional limits. Readers focused on flexible inference or combining GPs with other ML components like LLMs could find useful ideas here. Given the novelty of the equivalence and the potential applications, it is worth sending for peer review so that the details can be checked thoroughly.

Referee Report

2 major / 2 minor

Summary. The paper claims to establish an explicit equivalence between Gaussian processes and a class of linear diffusion models. Predictive sampling is recast as an ODE possessing closed-form Gaussian dynamics whose drift contains a likelihood-dependent guidance term; this term is replaced by a simple Monte Carlo average over pointwise likelihood evaluations. The construction recovers exact GP conditioning in the linear-Gaussian case and extends, without bespoke derivations, to arbitrary conditioning statements that admit pointwise likelihood evaluation, including non-linear physics and natural-language statements supplied by large language models. Whitening is introduced to isolate irreducible non-Gaussian dynamics, thereby minimising Wasserstein-2 transport cost and removing numerical stiffness.

Significance. If the claimed equivalence and the numerical stability of the Monte Carlo guidance term can be established, the work would supply a genuinely general-purpose mechanism for conditioning GPs on complex, non-conjugate information. The ability to incorporate statements expressed in natural language or by non-linear simulators without custom derivations would constitute a substantial advance in probabilistic modelling.

major comments (2)

[Abstract / ODE derivation] Abstract and the section deriving the ODE equivalence: the central claim that the Monte Carlo estimator for the likelihood-dependent guidance term remains accurate and stable for non-conjugate conditioning statements is load-bearing for the extension beyond the linear-Gaussian regime, yet the manuscript provides neither variance bounds nor effective-sample-size diagnostics for this estimator under the ODE flow.
[Whitening section] Section presenting the whitening transformation: the assertion that whitening isolates the irreducible non-Gaussian dynamics and thereby eliminates numerical stiffness must be accompanied by a quantitative comparison of the resulting Wasserstein-2 cost and stiffness metrics against the unwhitened formulation; without such evidence the claimed numerical advantage remains unverified.

minor comments (2)

[Notation / Monte Carlo estimator] Notation for the guidance term and the Monte Carlo estimator should be introduced with explicit dependence on the likelihood function and the number of samples; the current presentation leaves the scaling with conditioning complexity implicit.
[Assumptions paragraph] The manuscript should include a brief statement of the precise regularity conditions on the likelihood that guarantee the existence of the closed-form Gaussian dynamics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the potential of the proposed framework. We respond point-by-point to the major comments below, indicating the revisions we will make to address the concerns raised.

read point-by-point responses

Referee: [Abstract / ODE derivation] Abstract and the section deriving the ODE equivalence: the central claim that the Monte Carlo estimator for the likelihood-dependent guidance term remains accurate and stable for non-conjugate conditioning statements is load-bearing for the extension beyond the linear-Gaussian regime, yet the manuscript provides neither variance bounds nor effective-sample-size diagnostics for this estimator under the ODE flow.

Authors: We agree that theoretical or diagnostic support for the Monte Carlo guidance estimator is important to substantiate the non-conjugate claims. In the linear-Gaussian case the estimator is exact by construction, but for general likelihoods we currently rely on empirical performance. In the revised manuscript we will add a short derivation of variance bounds for the estimator under the linear ODE flow and report effective sample size diagnostics from the numerical experiments in the non-linear physics and language-conditioning sections. These additions will appear in the ODE derivation section. revision: yes
Referee: [Whitening section] Section presenting the whitening transformation: the assertion that whitening isolates the irreducible non-Gaussian dynamics and thereby eliminates numerical stiffness must be accompanied by a quantitative comparison of the resulting Wasserstein-2 cost and stiffness metrics against the unwhitened formulation; without such evidence the claimed numerical advantage remains unverified.

Authors: We accept that the current text presents the theoretical motivation for whitening without direct quantitative verification. We will revise the whitening section to include explicit comparisons of Wasserstein-2 transport costs and stiffness indicators (such as maximum integration step size or condition number of the drift) between the whitened and unwhitened formulations, drawing on the simulation results already obtained in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: GP-diffusion equivalence and ODE recasting are derived rather than tautological

full rationale

The paper derives an explicit equivalence between Gaussian processes and linear diffusion models, recasting predictive sampling as an ODE whose closed-form Gaussian dynamics are supplemented by a likelihood-dependent guidance term. In the linear-Gaussian regime this recovers standard GP conditioning exactly, while non-conjugate cases are handled by replacing the guidance term with a Monte Carlo average over pointwise likelihood evaluations. No load-bearing step reduces the claimed equivalence, the ODE formulation, or the extension to arbitrary conditioning statements to a fitted quantity defined by the target result itself, a self-referential definition, or a self-citation chain whose validity is presupposed. The whitening step and Wasserstein-2 minimisation are presented as consequences of the equivalence rather than inputs smuggled in by ansatz or prior self-work. The derivation therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unproven equivalence between GPs and linear diffusion models plus the adequacy of Monte Carlo for the guidance term; no free parameters or new entities are declared in the abstract.

axioms (1)

domain assumption An explicit equivalence exists between Gaussian processes and a class of linear diffusion models that preserves closed-form Gaussian dynamics.
Invoked to recast predictive sampling as an ODE.

pith-pipeline@v0.9.0 · 5715 in / 1148 out tokens · 36761 ms · 2026-05-21T02:02:31.709567+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 6 internal anchors

[1]

MIT Press, Cambridge, MA, USA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA, 2006

work page 2006
[2]

Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007

Håvard Rue and Sara Martino. Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007

work page 2007
[3]

Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011

Botond Cseke and Tom Heskes. Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011

work page 2011
[4]

Gaussian Processes for Big Data

James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data.arXiv preprint arXiv:1309.6835, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

work page 2020
[6]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

work page 2021
[7]

Flow matching for generative modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[8]

Building normalizing flows with stochastic inter- polants

Michael Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic inter- polants. InInternational Conference on Learning Representations, 2023

work page 2023
[9]

Flow straight and fast: learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023
[10]

The Principles of Diffusion Models

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models.arXiv preprint arXiv:2510.21890, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[12]

Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022

Marvin Pförtner, Ingo Steinwart, Philipp Hennig, and Jonathan Wenger. Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022

work page arXiv 2022
[13]

Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006

Thomas Mejer Hansen, Andre G Journel, Albert Tarantola, and Klaus Mosegaard. Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006

work page 2006
[14]

Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002

Ercan Solak, Roderick Murray-Smith, WE Leithead, Douglas Leith, and Carl Rasmussen. Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002

work page 2002
[15]

Inferring flow energy, space scales, and timescales: freely drifting vs

Aurelien Luigi Serge Ponte, Lachlan C Astfalck, Matthew D Rayson, Andrew P Zulberti, and Nicole L Jones. Inferring flow energy, space scales, and timescales: freely drifting vs. fixed-point observations.Nonlinear Processes in Geophysics, 31(4):571–586, 2024

work page 2024
[16]

Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991

Anthony O’Hagan. Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991

work page 1991
[17]

Probabilistic integration.Statistical Science, 34(1):1–22, 2019

François-Xavier Briol, Chris J Oates, Mark Girolami, Michael A Osborne, and Dino Sejdinovic. Probabilistic integration.Statistical Science, 34(1):1–22, 2019

work page 2019
[18]

Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017

Carl Jidling, Niklas Wahlström, Adrian Wills, and Thomas B Schön. Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017

work page 2017
[19]

On stationary processes in the plane.Biometrika, pages 434–449, 1954

Peter Whittle. On stationary processes in the plane.Biometrika, pages 434–449, 1954. 10

work page 1954
[20]

Finn Lindgren, Håvard Rue, and Johan Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(4):423–498, 2011

work page 2011
[21]

Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020

Arno Solin and Simo Särkkä. Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020

work page 2020
[22]

Efficiently sampling functions from gaussian process posteriors

James Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisen- roth. Efficiently sampling functions from gaussian process posteriors. InInternational confer- ence on machine learning, pages 10292–10302. PMLR, 2020

work page 2020
[23]

A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005

Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005

work page 1939
[24]

Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010

Miguel Lázaro-Gredilla, Joaquin Quinonero-Candela, Carl Edward Rasmussen, and Aníbal R Figueiras-Vidal. Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010

work page 2010
[25]

Kernel interpolation for scalable structured Gaussian processes (KISS-GP)

Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). InInternational Conference on Machine Learning, pages 1775–1784. PMLR, 2015

work page 2015
[26]

Preconditioning kernel matrices

Kurt Cutajar, Michael Osborne, John Cunningham, and Maurizio Filippone. Preconditioning kernel matrices. InInternational Conference on Machine Learning, pages 2529–2538. PMLR, 2016

work page 2016
[27]

Computation-aware Gaussian processes: model selection and linear-time inference

Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R Gardner, Geoff Pleiss, and John P Cun- ningham. Computation-aware Gaussian processes: model selection and linear-time inference. Advances in Neural Information Processing Systems, 37:31316–31349, 2024

work page 2024
[28]

Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013

Mauricio A Alvarez, David Luengo, and Neil D Lawrence. Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013

work page 2013
[29]

Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017

Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017

work page 2017
[30]

Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022

Andreas Besginow and Markus Lange-Hegermann. Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022

work page 2022
[31]

Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024

Oliver Hamelijnck, Arno Solin, and Theodoros Damoulas. Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024

work page 2024
[32]

AutoIP: A united framework to integrate physics into Gaussian processes

Da Long, Zheng Wang, Aditi Krishnapriyan, Robert Kirby, Shandian Zhe, and Michael Mahoney. AutoIP: A united framework to integrate physics into Gaussian processes. InInternational Conference on Machine Learning, pages 14210–14222. PMLR, 2022

work page 2022
[33]

Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021

Yifan Chen, Bamdad Hosseini, Houman Owhadi, and Andrew M Stuart. Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021

work page 2021
[34]

A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019

Michael Schober, Simo Särkkä, and Philipp Hennig. A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019

work page 2019
[35]

Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019

Filip Tronarp, Hans Kersting, Simo Särkkä, and Philipp Hennig. Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019

work page 2019
[36]

Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982. 11

work page 1982
[37]

A Survey on Diffusion Models for Inverse Problems

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems.arXiv preprint arXiv:2410.00083, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Diffusion posterior sampling for general noisy inverse problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations, ICLR 2023. The International Conference on Learning Representations, 2023

work page 2023
[39]

Manifold preserv- ing guided diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, et al. Manifold preserv- ing guided diffusion. InThe Twelfth International Conference on Learning Representations, ICLR 2024. The International Conference on Learning Representations, 2024

work page 2024
[40]

Free hunch: Denoiser covariance estimation for diffusion models without extra costs

Severi Rissanen, Markus Heinonen, and Arno Solin. Free hunch: Denoiser covariance estimation for diffusion models without extra costs. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[41]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR, 2021

work page 2021
[42]

Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019

Christian Agrell. Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019

work page 2019
[43]

Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014

Lizhen Lin and David B Dunson. Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014

work page 2014
[44]

Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025

Lachlan Astfalck, Deborshee Sen, Sayan Patra, Edward Cripps, and David Dunson. Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025

work page 2025
[45]

Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014

Fangpo Wang and Alan E Gelfand. Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014

work page 2014
[46]

Gaussian processes with monotonicity information

Jaakko Riihimäki and Aki Vehtari. Gaussian processes with monotonicity information. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 645–652. JMLR Workshop and Conference Proceedings, 2010

work page 2010
[47]

Gaussian process modeling with inequality con- straints

Sébastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality con- straints. InAnnales de la Faculté des Sciences de Toulouse: Mathématiques, pages 529–555, 2012

work page 2012
[48]

Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[49]

LLM Flow Processes for Text-Conditioned Regression

Felix Biggs and Samuel Willis. LLM flow processes for text-conditioned regression.arXiv preprint arXiv:2601.06147, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[50]

Springer, Cham, Switzerland, 2025

Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet.Statistical Optimal Transport. Springer, Cham, Switzerland, 2025

work page 2025
[51]

A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

work page 2000
[52]

Springer, Berlin, Germany, 1993

Ernst Hairer, Gerhard Wanner, and Syvert P Nørsett.Solving Ordinary Differential Equations I: Nonstiff Problems. Springer, Berlin, Germany, 1993

work page 1993
[53]

On logarithmic concave measures and functions.Acta Sci

András Prékopa. On logarithmic concave measures and functions.Acta Sci. Math., 34:335, 1973

work page 1973
[54]

Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014

Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014

work page 2014
[55]

DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 12

work page 2022
[56]

Cambridge University Press, Cambridge, UK, 2019

Simo Särkkä and Arno Solin.Applied Stochastic Differential Equations, volume 10. Cambridge University Press, Cambridge, UK, 2019

work page 2019
[57]

Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025

Hanyang Wang, Juergen Branke, and Matthias Poloczek. Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025

work page 2025
[58]

Preference learning with Gaussian processes

Wei Chu and Zoubin Ghahramani. Preference learning with Gaussian processes. InInternational Conference on Machine Learning, 2005

work page 2005
[59]

Oxford University Press, Oxford, UK, 1995

Andreu Mas-Colell, Michael Dennis Whinston, Jerry R Green, et al.Microeconomic Theory. Oxford University Press, Oxford, UK, 1995

work page 1995
[60]

Preference exploration for efficient Bayesian optimization with multiple outcomes

Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration for efficient Bayesian optimization with multiple outcomes. InInternational Conference on Artificial Intelligence and Statistics, 2022

work page 2022
[61]

BoTorch: A framework for efficient Monte-Carlo Bayesian optimization

Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. Advances in Neural Information Processing Systems, 2020

work page 2020
[62]

∂f (i) 0 ∂ft #T ∇f0 p(C |f (i) 0 ).(27) Rewriting using the log-derivative trick, we obtain ∇ft p(C |f (i) 0 ) =p(C |f (i) 0 )

Carl Hvarfner, Erik O Hellsten, and Luigi Nardi. Vanilla Bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning, 2024. 13 A Derivation of our flow’s marginal and joint distributions Under the prior f0 ∼ N(m ∗,K ∗∗) and the corruption model (8), the pair (f0,f t) is jointly Gaussian. Writing ft =α(t)f 0 + p ...

work page 2024
[63]

the reversed-time exact drift a satisfies a one-sided Lipschitz condition in f, uniformly in r: therefore there existsη τ ∈Rsuch that ⟨x−y,a(r,x)−a(r,y)⟩ ≤η τ ∥x−y∥ 2 ∀x,y∈R m, r∈[0,1−τ];(47)

work page
[64]

The constant ητ is allowed to be negative; this contractive case will be exploited in Corollary E.7 below

the realised guidance approximation error is uniformly bounded on the relevant state-space region visited by the exact and approximate trajectories: ετ := sup (t,f)∈R τ ∥g(t,f)−bg(t,f)∥<∞, where Rτ ⊆[τ,1]×R m denotes any region containing both trajectories on the truncated interval. The constant ητ is allowed to be negative; this contractive case will be ...

work page
[65]

Build S samples f(i) 0 ∼p(f 0 |f t,D) via f(i) 0 =µ 0|t +Σ 1/2 0|t ϵ(i), with µ0|t from (22) and Σ0|t from (23) withm ∗|y,K ∗∗|y andA |y(t)in place ofm ∗,K ∗∗ andA(t)

work page
[66]

Evaluate each log-likelihood logp(C |f (i) 0 ) and its gradient ∇f0 logp(C |f (i) 0 ) at each sample,

work page
[67]

Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value

Compute normalised weights via the numerically stable log-sum-exp operation log ¯w(i) = logp(C |f (i) 0 )−logsumexp r logp(C |f (r) 0 ). Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value. 26 Algorithm 1FLOWGP: sampling from a GP predictive distribution...

work page
[68]

Evaluate the weighted sum (30), applying the Jacobian to each gradient term,

work page
[69]

We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold

We clip the norm of the vector field (after scaling by− 1 2 β(t) as prescribed by the probability flow ODE (16)) to limit excessively large steps to ensure stable integration. We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold. This transformation bounds excessively large gradients whilst preserving Li...

work page
[70]

Vanilla BO

using a scaled RBF kernel cov(f0(t), f0(t′)) =τ 2 exp −(t−t ′)2 2κ2 ,(70) with hyperparameters τ 2 and κ optimised together with an affine mean function by maximising the marginal likelihood using just D,. As in the previous experiment, the GP predictive mean m∗|y and covariance K∗∗|y on the evaluation grid are used to construct the base Gaussian predicti...

work page

[1] [1]

MIT Press, Cambridge, MA, USA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA, 2006

work page 2006

[2] [2]

Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007

Håvard Rue and Sara Martino. Approximate Bayesian inference for hierarchical Gaussian Markov random field models.Journal of Statistical Planning and Inference, 137(10):3177–3192, 2007

work page 2007

[3] [3]

Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011

Botond Cseke and Tom Heskes. Approximate marginals in latent Gaussian models.Journal of Machine Learning Research, 12:417–454, 2011

work page 2011

[4] [4]

Gaussian Processes for Big Data

James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data.arXiv preprint arXiv:1309.6835, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

work page 2020

[6] [6]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

work page 2021

[7] [7]

Flow matching for generative modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[8] [8]

Building normalizing flows with stochastic inter- polants

Michael Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic inter- polants. InInternational Conference on Learning Representations, 2023

work page 2023

[9] [9]

Flow straight and fast: learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023

[10] [10]

The Principles of Diffusion Models

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models.arXiv preprint arXiv:2510.21890, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[12] [12]

Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022

Marvin Pförtner, Ingo Steinwart, Philipp Hennig, and Jonathan Wenger. Physics-informed Gaussian process regression generalizes linear PDE solvers.arXiv preprint arXiv:2212.12474, 2022

work page arXiv 2022

[13] [13]

Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006

Thomas Mejer Hansen, Andre G Journel, Albert Tarantola, and Klaus Mosegaard. Linear inverse Gaussian theory and geostatistics.Geophysics, 71(6):R101–R111, 2006

work page 2006

[14] [14]

Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002

Ercan Solak, Roderick Murray-Smith, WE Leithead, Douglas Leith, and Carl Rasmussen. Derivative observations in Gaussian process models of dynamic systems.Advances in Neural Information Processing Systems, 15, 2002

work page 2002

[15] [15]

Inferring flow energy, space scales, and timescales: freely drifting vs

Aurelien Luigi Serge Ponte, Lachlan C Astfalck, Matthew D Rayson, Andrew P Zulberti, and Nicole L Jones. Inferring flow energy, space scales, and timescales: freely drifting vs. fixed-point observations.Nonlinear Processes in Geophysics, 31(4):571–586, 2024

work page 2024

[16] [16]

Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991

Anthony O’Hagan. Bayes–Hermite quadrature.Journal of Statistical Planning and Inference, 29(3):245–260, 1991

work page 1991

[17] [17]

Probabilistic integration.Statistical Science, 34(1):1–22, 2019

François-Xavier Briol, Chris J Oates, Mark Girolami, Michael A Osborne, and Dino Sejdinovic. Probabilistic integration.Statistical Science, 34(1):1–22, 2019

work page 2019

[18] [18]

Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017

Carl Jidling, Niklas Wahlström, Adrian Wills, and Thomas B Schön. Linearly constrained Gaussian processes.Advances in Neural Information Processing Systems, 30, 2017

work page 2017

[19] [19]

On stationary processes in the plane.Biometrika, pages 434–449, 1954

Peter Whittle. On stationary processes in the plane.Biometrika, pages 434–449, 1954. 10

work page 1954

[20] [20]

Finn Lindgren, Håvard Rue, and Johan Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(4):423–498, 2011

work page 2011

[21] [21]

Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020

Arno Solin and Simo Särkkä. Hilbert space methods for reduced-rank gaussian process regres- sion.Statistics and Computing, 30(2):419–446, 2020

work page 2020

[22] [22]

Efficiently sampling functions from gaussian process posteriors

James Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisen- roth. Efficiently sampling functions from gaussian process posteriors. InInternational confer- ence on machine learning, pages 10292–10302. PMLR, 2020

work page 2020

[23] [23]

A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005

Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approxi- mate Gaussian process regression.Journal of Machine Learning Research, 6(Dec):1939–1959, 2005

work page 1939

[24] [24]

Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010

Miguel Lázaro-Gredilla, Joaquin Quinonero-Candela, Carl Edward Rasmussen, and Aníbal R Figueiras-Vidal. Sparse spectrum Gaussian process regression.Journal of Machine Learning Research, 11:1865–1881, 2010

work page 2010

[25] [25]

Kernel interpolation for scalable structured Gaussian processes (KISS-GP)

Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). InInternational Conference on Machine Learning, pages 1775–1784. PMLR, 2015

work page 2015

[26] [26]

Preconditioning kernel matrices

Kurt Cutajar, Michael Osborne, John Cunningham, and Maurizio Filippone. Preconditioning kernel matrices. InInternational Conference on Machine Learning, pages 2529–2538. PMLR, 2016

work page 2016

[27] [27]

Computation-aware Gaussian processes: model selection and linear-time inference

Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R Gardner, Geoff Pleiss, and John P Cun- ningham. Computation-aware Gaussian processes: model selection and linear-time inference. Advances in Neural Information Processing Systems, 37:31316–31349, 2024

work page 2024

[28] [28]

Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013

Mauricio A Alvarez, David Luengo, and Neil D Lawrence. Linear latent force models us- ing Gaussian processes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2693–2705, 2013

work page 2013

[29] [29]

Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017

Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using Gaussian processes.Journal of Computational Physics, 348:683– 693, 2017

work page 2017

[30] [30]

Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022

Andreas Besginow and Markus Lange-Hegermann. Constraining Gaussian processes to systems of linear ordinary differential equations.Advances in Neural Information Processing Systems, 35:29386–29399, 2022

work page 2022

[31] [31]

Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024

Oliver Hamelijnck, Arno Solin, and Theodoros Damoulas. Physics-informed variational state- space Gaussian processes.Advances in Neural Information Processing Systems, 37:98505– 98536, 2024

work page 2024

[32] [32]

AutoIP: A united framework to integrate physics into Gaussian processes

Da Long, Zheng Wang, Aditi Krishnapriyan, Robert Kirby, Shandian Zhe, and Michael Mahoney. AutoIP: A united framework to integrate physics into Gaussian processes. InInternational Conference on Machine Learning, pages 14210–14222. PMLR, 2022

work page 2022

[33] [33]

Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021

Yifan Chen, Bamdad Hosseini, Houman Owhadi, and Andrew M Stuart. Solving and learning nonlinear PDEs with Gaussian processes.Journal of Computational Physics, 447:110668, 2021

work page 2021

[34] [34]

A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019

Michael Schober, Simo Särkkä, and Philipp Hennig. A probabilistic model for the numerical solution of initial value problems.Statistics and Computing, 2019

work page 2019

[35] [35]

Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019

Filip Tronarp, Hans Kersting, Simo Särkkä, and Philipp Hennig. Probabilistic solutions to ordinary differential equations as nonlinear bayesian filtering: a new perspective.Statistics and Computing, 2019

work page 2019

[36] [36]

Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982. 11

work page 1982

[37] [37]

A Survey on Diffusion Models for Inverse Problems

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems.arXiv preprint arXiv:2410.00083, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Diffusion posterior sampling for general noisy inverse problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations, ICLR 2023. The International Conference on Learning Representations, 2023

work page 2023

[39] [39]

Manifold preserv- ing guided diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, et al. Manifold preserv- ing guided diffusion. InThe Twelfth International Conference on Learning Representations, ICLR 2024. The International Conference on Learning Representations, 2024

work page 2024

[40] [40]

Free hunch: Denoiser covariance estimation for diffusion models without extra costs

Severi Rissanen, Markus Heinonen, and Arno Solin. Free hunch: Denoiser covariance estimation for diffusion models without extra costs. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[41] [41]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR, 2021

work page 2021

[42] [42]

Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019

Christian Agrell. Gaussian processes with linear operator inequality constraints.Journal of Machine Learning Research, 20(135):1–36, 2019

work page 2019

[43] [43]

Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014

Lizhen Lin and David B Dunson. Bayesian monotone regression using Gaussian process projection.Biometrika, 101(2):303–317, 2014

work page 2014

[44] [44]

Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025

Lachlan Astfalck, Deborshee Sen, Sayan Patra, Edward Cripps, and David Dunson. Posterior projection for inference in constrained spaces.arXiv e-prints, pages arXiv–1812, 2025

work page 2025

[45] [45]

Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014

Fangpo Wang and Alan E Gelfand. Modeling space and space-time directional data using projected Gaussian processes.Journal of the American Statistical Association, 109(508):1565– 1580, 2014

work page 2014

[46] [46]

Gaussian processes with monotonicity information

Jaakko Riihimäki and Aki Vehtari. Gaussian processes with monotonicity information. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 645–652. JMLR Workshop and Conference Proceedings, 2010

work page 2010

[47] [47]

Gaussian process modeling with inequality con- straints

Sébastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality con- straints. InAnnales de la Faculté des Sciences de Toulouse: Mathématiques, pages 529–555, 2012

work page 2012

[48] [48]

Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[49] [49]

LLM Flow Processes for Text-Conditioned Regression

Felix Biggs and Samuel Willis. LLM flow processes for text-conditioned regression.arXiv preprint arXiv:2601.06147, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[50] [50]

Springer, Cham, Switzerland, 2025

Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet.Statistical Optimal Transport. Springer, Cham, Switzerland, 2025

work page 2025

[51] [51]

A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

work page 2000

[52] [52]

Springer, Berlin, Germany, 1993

Ernst Hairer, Gerhard Wanner, and Syvert P Nørsett.Solving Ordinary Differential Equations I: Nonstiff Problems. Springer, Berlin, Germany, 1993

work page 1993

[53] [53]

On logarithmic concave measures and functions.Acta Sci

András Prékopa. On logarithmic concave measures and functions.Acta Sci. Math., 34:335, 1973

work page 1973

[54] [54]

Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014

Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Statistics Surveys, 8:45, 2014

work page 2014

[55] [55]

DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 12

work page 2022

[56] [56]

Cambridge University Press, Cambridge, UK, 2019

Simo Särkkä and Arno Solin.Applied Stochastic Differential Equations, volume 10. Cambridge University Press, Cambridge, UK, 2019

work page 2019

[57] [57]

Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025

Hanyang Wang, Juergen Branke, and Matthias Poloczek. Bayesian optimization with preference exploration using a monotonic neural network ensemble.Advances in Neural Information Processing Systems, 2025

work page 2025

[58] [58]

Preference learning with Gaussian processes

Wei Chu and Zoubin Ghahramani. Preference learning with Gaussian processes. InInternational Conference on Machine Learning, 2005

work page 2005

[59] [59]

Oxford University Press, Oxford, UK, 1995

Andreu Mas-Colell, Michael Dennis Whinston, Jerry R Green, et al.Microeconomic Theory. Oxford University Press, Oxford, UK, 1995

work page 1995

[60] [60]

Preference exploration for efficient Bayesian optimization with multiple outcomes

Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration for efficient Bayesian optimization with multiple outcomes. InInternational Conference on Artificial Intelligence and Statistics, 2022

work page 2022

[61] [61]

BoTorch: A framework for efficient Monte-Carlo Bayesian optimization

Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. Advances in Neural Information Processing Systems, 2020

work page 2020

[62] [62]

∂f (i) 0 ∂ft #T ∇f0 p(C |f (i) 0 ).(27) Rewriting using the log-derivative trick, we obtain ∇ft p(C |f (i) 0 ) =p(C |f (i) 0 )

Carl Hvarfner, Erik O Hellsten, and Luigi Nardi. Vanilla Bayesian optimization performs great in high dimensions. InInternational Conference on Machine Learning, 2024. 13 A Derivation of our flow’s marginal and joint distributions Under the prior f0 ∼ N(m ∗,K ∗∗) and the corruption model (8), the pair (f0,f t) is jointly Gaussian. Writing ft =α(t)f 0 + p ...

work page 2024

[63] [63]

the reversed-time exact drift a satisfies a one-sided Lipschitz condition in f, uniformly in r: therefore there existsη τ ∈Rsuch that ⟨x−y,a(r,x)−a(r,y)⟩ ≤η τ ∥x−y∥ 2 ∀x,y∈R m, r∈[0,1−τ];(47)

work page

[64] [64]

The constant ητ is allowed to be negative; this contractive case will be exploited in Corollary E.7 below

the realised guidance approximation error is uniformly bounded on the relevant state-space region visited by the exact and approximate trajectories: ετ := sup (t,f)∈R τ ∥g(t,f)−bg(t,f)∥<∞, where Rτ ⊆[τ,1]×R m denotes any region containing both trajectories on the truncated interval. The constant ητ is allowed to be negative; this contractive case will be ...

work page

[65] [65]

Build S samples f(i) 0 ∼p(f 0 |f t,D) via f(i) 0 =µ 0|t +Σ 1/2 0|t ϵ(i), with µ0|t from (22) and Σ0|t from (23) withm ∗|y,K ∗∗|y andA |y(t)in place ofm ∗,K ∗∗ andA(t)

work page

[66] [66]

Evaluate each log-likelihood logp(C |f (i) 0 ) and its gradient ∇f0 logp(C |f (i) 0 ) at each sample,

work page

[67] [67]

Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value

Compute normalised weights via the numerically stable log-sum-exp operation log ¯w(i) = logp(C |f (i) 0 )−logsumexp r logp(C |f (r) 0 ). Here logsumexpr a(r) := max r a(r) + log "X r exp a(r) −max r a(r) # avoids overflow and underflow by subtracting the potentially very small maximal value. 26 Algorithm 1FLOWGP: sampling from a GP predictive distribution...

work page

[68] [68]

Evaluate the weighted sum (30), applying the Jacobian to each gradient term,

work page

[69] [69]

We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold

We clip the norm of the vector field (after scaling by− 1 2 β(t) as prescribed by the probability flow ODE (16)) to limit excessively large steps to ensure stable integration. We use the smooth saturation: v7→v·τtanh(∥v∥/τ)/(∥v∥+1e −8), where τ= 1×10 2 is a maximum norm threshold. This transformation bounds excessively large gradients whilst preserving Li...

work page

[70] [70]

Vanilla BO

using a scaled RBF kernel cov(f0(t), f0(t′)) =τ 2 exp −(t−t ′)2 2κ2 ,(70) with hyperparameters τ 2 and κ optimised together with an affine mean function by maximising the marginal likelihood using just D,. As in the previous experiment, the GP predictive mean m∗|y and covariance K∗∗|y on the evaluation grid are used to construct the base Gaussian predicti...

work page