Gaussian Process-based learning with new MCMC-based implementation of Wishart prior on correlation matrix

Dalia Chakrabarty; Kane Warrior

arxiv: 2605.27093 · v1 · pith:Q6E5F3BNnew · submitted 2026-05-26 · 📊 stat.ML · cs.LG

Gaussian Process-based learning with new MCMC-based implementation of Wishart prior on correlation matrix

Kane Warrior , Dalia Chakrabarty This is my paper

Pith reviewed 2026-06-29 15:33 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords Gaussian processWishart priorMCMC samplingcovariance matrixlengthscale parametersinput relevanceBayesian inferenceadaptive prior

0 comments

The pith

A self-assembled Wishart prior on the covariance matrix helps diagnose weakly informative inputs during Gaussian process learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Wishart prior placed directly on the covariance matrix of the Gaussian process likelihood, rather than only on the kernel hyperparameters. Inference proceeds via MCMC, where the prior's scale matrix is defined adaptively from a look-back window of recent chain iterations. This construction is intended to ease simultaneous learning of multiple lengthscale parameters when the target function is highly multivariate. A sympathetic reader would care because the approach offers a route to identify which inputs contribute little to predictions, potentially simplifying models and improving reliability. The authors illustrate the idea on one synthetic dataset and one real-world dataset.

Core claim

We develop a self-assembled Wishart prior for the covariance matrix while undertaking Bayesian inference on the kernel hyperparameters using MCMC. The construction uses a look-back window over recent MCMC iterations to define a time-step dependent scale matrix, thereby introducing adaptiveness to the chain. Results suggest that direct prior specification on the covariance matrix can be useful for diagnosing weakly informative inputs within the GP-based learning paradigm.

What carries the argument

The self-assembled Wishart prior whose scale matrix is updated at each MCMC step from a look-back window of recent samples, allowing direct prior specification on the covariance matrix.

If this is right

Multiple lengthscale parameters become easier to infer jointly when the covariance receives a direct prior.
Weakly informative inputs can be flagged by inspecting the posterior induced by the covariance prior.
The adaptive MCMC sampler supports reliable hyperparameter learning for highly multivariate target functions.
The same prior construction applies equally to synthetic and real-world GP regression tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The look-back construction might be transferred to other positive-definite matrix priors used in multivariate Bayesian models.
Input diagnosis performed this way could reduce reliance on separate variable-selection procedures before fitting a GP.
Varying the length of the look-back window offers a tunable knob that future experiments could optimize for different data regimes.

Load-bearing premise

Defining the scale matrix from a look-back window of recent MCMC iterations produces a useful adaptive prior without destabilizing the sampler or introducing unwanted bias.

What would settle it

If the posterior on the covariance matrix fails to assign low weight to known irrelevant inputs in the synthetic data experiment, or if the adaptive chain mixes worse than a non-adaptive Wishart prior on the same data, the claimed diagnostic utility would not hold.

Figures

Figures reproduced from arXiv: 2605.27093 by Dalia Chakrabarty, Kane Warrior.

**Figure 2.** Figure 2: Baseline 15-dimensional synthetic experiment. Top row: post-burn-in ARD lengthscale [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Synthetic experiment with 3 relevant inputs. Top row: post-burn-in ARD lengthscale [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Synthetic experiment with 9 relevant inputs. Top row: post-burn-in ARD lengthscale [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Synthetic experiment with 12 relevant inputs. Top row: post-burn-in ARD lengthscale [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Less interrelated 6-input synthetic experiment. Top row: post-burn-in ARD lengthscale [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Full 5-input Tétouan analysis. Top row: post-burn-in ARD lengthscale chains under the [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Reduced 4-input Tétouan analysis. Top row: post-burn-in ARD lengthscale chains under [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

In probabilstic supervised learning of an input-output relationship - as a sample function of a Gaussian Process (GP) - priors are typically specified for the hyperparameters of the kernel that parametrises the covariance function of the GP, where the induced covariance matrix of the (resulting multivariate Normal) likelihood, governs the learning and prediction. When the sought function is highly multivariate, multiple lengthscale parameters must be learnt simultaneously, making inference difficult. We develop a ``self-assembled'' Wishart prior for the covariance matrix, while undertaking Bayesian inference on the kernel hyperparameters using MCMC. The construction uses a look-back window over recent MCMC iterations to define a time-step dependent scale matrix, thereby introducing adaptiveness to the chain. Results suggest that direct prior specification on the covariance matrix can be useful for diagnosing weakly informative inputs within the GP-based learning paradigm. We support our prior development with two distinct empirical illustrations - one on synthetic data, and another on a real-world dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The adaptive look-back Wishart prior risks invalid MCMC sampling since the adaptation does not diminish.

read the letter

The paper introduces a self-assembled Wishart prior on the GP covariance matrix, built by pulling the scale matrix from a look-back window of recent MCMC samples to add adaptiveness. This is framed as a way to put a direct prior on the correlation structure and thereby flag weakly informative inputs in multivariate settings.

The construction is new in its specific adaptive assembly, and the idea of bypassing kernel hyperparameter priors in favor of a covariance-matrix prior is a reasonable direction for input diagnosis. The two empirical illustrations are at least mentioned, one synthetic and one real.

The central problem is that the adaptation never tapers. Standard adaptive MCMC theory requires the adaptation rate to go to zero for the chain to converge to the target posterior; a fixed look-back window keeps the dependence alive at every step. The abstract shows no verification of this condition, no convergence diagnostics, and no comparison against a non-adaptive baseline. Without those, the claimed ability to diagnose inputs rests on samples that may not come from the intended distribution.

The rest of the setup follows existing Wishart and GP MCMC work, so the novelty sits entirely in the adaptive piece. No error bars, no quantitative metrics, and no discussion of how the time-dependent scale affects the stationary distribution.

This is for people already working on Bayesian GPs with many lengthscales who might want to experiment with covariance priors. A reader could extract the basic construction, but would have to repair the MCMC validity first.

I would not send it to peer review in this form; the methodological gap on convergence is load-bearing.

Referee Report

1 major / 1 minor

Summary. The manuscript develops a 'self-assembled' Wishart prior on the covariance matrix for Gaussian Process models in probabilistic supervised learning. It implements this via MCMC on kernel hyperparameters, where the scale matrix is constructed adaptively from a look-back window over recent MCMC iterations, introducing time-step dependence. The approach is illustrated on synthetic data and a real-world dataset, with results suggesting utility for diagnosing weakly informative inputs when multiple lengthscales must be learned simultaneously.

Significance. If the adaptive construction is valid, the method could enable direct prior specification on the full covariance matrix rather than individual kernel parameters, offering a diagnostic tool for input relevance in high-dimensional GP settings. The two empirical illustrations provide initial support for this use case.

major comments (1)

[Abstract] Abstract (construction paragraph): the adaptive MCMC defines a time-step dependent scale matrix via a look-back window over recent iterations. Standard adaptive MCMC theory requires conditions such as diminishing adaptation (adaptation rate → 0) to guarantee convergence to the target posterior; the manuscript gives no indication that these conditions hold for the window scheme or that they were verified, which is load-bearing for the validity of all reported results.

minor comments (1)

[Abstract] Abstract, first sentence: 'probabilstic' is a typo and should be 'probabilistic'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below, indicating the revision that will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract (construction paragraph): the adaptive MCMC defines a time-step dependent scale matrix via a look-back window over recent iterations. Standard adaptive MCMC theory requires conditions such as diminishing adaptation (adaptation rate → 0) to guarantee convergence to the target posterior; the manuscript gives no indication that these conditions hold for the window scheme or that they were verified, which is load-bearing for the validity of all reported results.

Authors: We agree that the adaptive construction of the scale matrix via a fixed look-back window renders the target distribution time-dependent and that the manuscript provides no discussion or verification of standard adaptive MCMC conditions such as diminishing adaptation. The approach is presented as an empirical construction for prior specification and input diagnosis rather than a theoretically convergent sampler. We will revise the manuscript to explicitly state the heuristic character of the scheme, note the lack of formal convergence guarantees, and add a brief discussion of this point (with reference to the empirical results on synthetic and real data) in the methods and/or a new limitations subsection. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is explicitly constructed as adaptive MCMC.

full rationale

The paper defines its Wishart prior construction explicitly via a look-back window on MCMC iterations to create a time-dependent scale matrix. This is presented as the method itself (an adaptive prior), with empirical illustrations on synthetic and real data to suggest utility for diagnosing weakly informative inputs. No derivation chain is claimed that reduces a 'prediction' or result to its inputs by construction, no self-citations are invoked as load-bearing, and no uniqueness theorems or ansatzes are smuggled. The central claim remains an empirical suggestion about the prior's usefulness rather than a tautological output. This is self-contained as a methodological proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to identify specific free parameters, axioms, or invented entities; no equations or sections are available for audit.

pith-pipeline@v0.9.1-grok · 5693 in / 1020 out tokens · 34491 ms · 2026-06-29T15:33:01.909679+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Carl Edward Rasmussen and Christopher K

doi: 10.1198/TECH.2011.10148. Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. Gareth O. Roberts and Jeffrey S. Rosenthal. Coupling and ergodicity of adaptive markov chain monte carlo algorithms.Journal of Applied Probability, 44(2):458–475, 2007. Michael L. Stein.Interpolation o...

work page doi:10.1198/tech.2011.10148 2011
[2]

doi: 10.1007/978-1-4612-1494-6. S. Sundararajan and S. Sathiya Keerthi. Predictive approaches for choosing hyperparameters in gaus- sian processes.Neural Computation, 13(5):1103–1118, 2001. doi: 10.1162/089976601300014312. Christopher K. I. Williams. Gaussian processes for regression. In Christopher M. Bishop, editor, Neural Networks for Machine Learning,...

work page doi:10.1007/978-1-4612-1494-6 2001

[1] [1]

Carl Edward Rasmussen and Christopher K

doi: 10.1198/TECH.2011.10148. Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. Gareth O. Roberts and Jeffrey S. Rosenthal. Coupling and ergodicity of adaptive markov chain monte carlo algorithms.Journal of Applied Probability, 44(2):458–475, 2007. Michael L. Stein.Interpolation o...

work page doi:10.1198/tech.2011.10148 2011

[2] [2]

doi: 10.1007/978-1-4612-1494-6. S. Sundararajan and S. Sathiya Keerthi. Predictive approaches for choosing hyperparameters in gaus- sian processes.Neural Computation, 13(5):1103–1118, 2001. doi: 10.1162/089976601300014312. Christopher K. I. Williams. Gaussian processes for regression. In Christopher M. Bishop, editor, Neural Networks for Machine Learning,...

work page doi:10.1007/978-1-4612-1494-6 2001