Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Alexander Marquard; Julian Rodemann; Michele Caprio; Thomas Augustin

arxiv: 2605.12208 · v2 · pith:JBZOYWTEnew · submitted 2026-05-12 · 📊 stat.ML · cs.AI· cs.LG· stat.CO

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Julian Rodemann , Alexander Marquard , Thomas Augustin , Michele Caprio This is my paper

Pith reviewed 2026-05-13 04:44 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGstat.CO

keywords Bayesian uncertainty quantificationLaplace approximationself-supervised learningposterior predictive distributionpredictive calibrationregression tasksBayesian neural networks

0 comments

The pith

Refitting models on their own predictions approximates the posterior predictive distribution directly and improves calibration over classical Laplace methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shifts focus from approximating the parameter posterior to directly approximating the posterior predictive distribution in Bayesian models. It does this by treating the model's own predictions as pseudo-data and refitting the model on them, inspired by self-training: high likelihood assigned to those self-predictions signals low uncertainty. This produces a deterministic, sampling-free method called Self-Supervised Laplace Approximation (SSLA), plus a cheaper approximate version (ASSLA) that skips full refitting. The modular design supports different prior choices for sensitivity checks. On regression tasks with both simulated and real data, the approach yields better predictive calibration than standard Laplace approximations while remaining efficient for linear models and neural networks.

Core claim

We propose to bypass the parameter posterior and focus directly on approximating the posterior predictive distribution by drawing inspiration from self-training. Essentially, we quantify a Bayesian model's predictive uncertainty by refitting on self-predicted data. If a model assigns high likelihood to self-predicted data, these predictions are of low uncertainty, and vice versa. This yields a deterministic, sampling-free approximation of the posterior predictive called Self-Supervised Laplace Approximation (SSLA). An approximate version (ASSLA) avoids expensive refitting. The method supports classical Bayesian sensitivity analysis via different priors and is studied in regression settings.

What carries the argument

Self-Supervised Laplace Approximation (SSLA), a procedure that refits the model on its own predictions treated as pseudo-data to quantify uncertainty in the posterior predictive distribution.

If this is right

SSLA enables direct sensitivity analysis to prior choice without additional sampling.
The method applies to both Bayesian linear models and Bayesian neural networks for regression.
ASSLA provides a computationally cheaper alternative that still improves calibration over standard Laplace approximations.
The approach remains deterministic and sampling-free across the studied tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The self-supervised refitting idea could be tested for extending calibration gains to classification problems.
If the initial model predictions are poor, the method may amplify errors rather than quantify uncertainty faithfully.
Combining SSLA with other approximate inference techniques might further reduce computational cost in large-scale settings.

Load-bearing premise

That the model's initial predictions are reliable enough to serve as pseudo-labels whose likelihood under refitting accurately reflects true predictive uncertainty.

What would settle it

A dataset where SSLA or ASSLA predictive intervals show substantially worse calibration metrics (such as coverage or expected calibration error) than classical Laplace approximations on the same models.

Figures

Figures reproduced from arXiv: 2605.12208 by Alexander Marquard, Julian Rodemann, Michele Caprio, Thomas Augustin.

**Figure 2.** Figure 2: We display the predictive uncertainty intervals for various uncertainty quantification methods, [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Illustrated are the analytic (green) and approximated (SSLA - red, ASSLA - black) PPDs for [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗

read the original abstract

Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose to bypass the parameter posterior and focus directly on approximating the posterior predictive distribution. We achieve this by drawing inspiration from self-training within self-supervised and semi-supervised learning. Essentially, we quantify a Bayesian model's predictive uncertainty by refitting on self-predicted data. The idea is strikingly simple: If a model assigns high likelihood to self-predicted data, these predictions are of low uncertainty, and vice versa. This yields a deterministic, sampling-free approximation of the posterior predictive. The modular structure of our Self-Supervised Laplace Approximation (SSLA) further allows us to plug in different prior specifications, enabling classical Bayesian sensitivity (w.r.t. prior choice) analysis. In order to bypass expensive refitting, we further introduce an approximate version of SSLA, called ASSLA. We study (A)SSLA both theoretically and empirically in regression models ranging from Bayesian linear models to Bayesian neural networks. Across a wide array of regression tasks with simulated and real-world datasets, our methods outperform classical Laplace approximations in predictive calibration while remaining computationally efficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SSLA tries to approximate the posterior predictive by refitting on self-predictions, which is a fresh angle but hinges on the initial model already being decent.

read the letter

The paper's main contribution is a method called Self-Supervised Laplace Approximation that skips approximating the parameter posterior and instead refits the model using its own point predictions as targets to gauge uncertainty. If the model likes its own predictions, uncertainty is low; otherwise high. This is meant to give a deterministic approximation to the posterior predictive. They build on self-training ideas from semi-supervised learning and make it modular so you can swap in different priors. There's also a cheaper approximate version, ASSLA, that avoids full refitting. They back this with theory and tests on regression problems from simple linear models to Bayesian neural networks, using both synthetic and real datasets. The claim is that it gives better calibrated predictions than plain Laplace approximations at similar cost. The soft spot is the reliance on those initial self-predictions being accurate enough. Under model misspecification or in areas with lots of noise, using bad pseudo-labels could amplify mistakes instead of properly quantifying uncertainty. Classical Laplace doesn't have this bootstrap issue because it works directly on the parameter space. The experiments would need to show that this doesn't happen in practice, and I'd look for ablations on how sensitive it is to the starting model quality. This work is for researchers and practitioners focused on uncertainty quantification in Bayesian regression, especially those wanting something faster than sampling methods. It shows clear thinking in combining concepts from different areas, even if the central assumption needs more scrutiny. I'd recommend sending it for peer review. The idea is original enough to warrant feedback, though revisions would likely focus on validating the approximation's robustness.

Referee Report

2 major / 2 minor

Summary. The paper proposes Self-Supervised Laplace Approximation (SSLA) to directly approximate the posterior predictive distribution p(y*|x*,D) in Bayesian regression models by refitting on self-generated pseudo-labels from the initial model predictions, inspired by self-training. An approximate variant ASSLA is introduced to avoid full refitting. The approach is analyzed theoretically and evaluated empirically on regression tasks ranging from Bayesian linear models to neural networks, with claims of improved predictive calibration over classical Laplace approximations while remaining computationally efficient and modular for prior sensitivity analysis.

Significance. If the central approximation holds with controlled bias, SSLA provides a simple, deterministic, sampling-free method for predictive uncertainty quantification that shifts focus from parameter posteriors to predictions. This could be useful for scalable Bayesian deep learning applications, and the modular prior plug-in enables straightforward sensitivity checks not always available in other approximations.

major comments (2)

[§3] §3 (Method): The core construction refits the model (or its Laplace approximation) using the initial MAP predictions as targets to obtain a curvature or likelihood-based uncertainty measure. This step is load-bearing for the claim of approximating the posterior predictive, yet the paper provides no quantitative bound on the approximation error when the initial predictions deviate from the true conditional (e.g., under misspecification or high noise). A concrete error analysis or counterexample regime is required.
[§5] §5 (Experiments): The abstract states outperformance in predictive calibration across simulated and real-world regression datasets, but the experimental design details (baselines, number of runs, error bars, statistical tests, and handling of self-prediction bias) are not summarized here; without these, the empirical support for the central claim cannot be fully assessed.

minor comments (2)

[§3] Notation for the self-supervised loss and the resulting approximate predictive variance should be introduced with explicit equations early in §3 to improve readability.
[Abstract] Clarify in the abstract and introduction whether ASSLA is a first-order approximation to SSLA or an independent method; the current wording leaves this ambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, providing the strongest honest defense of the manuscript while indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [§3] §3 (Method): The core construction refits the model (or its Laplace approximation) using the initial MAP predictions as targets to obtain a curvature or likelihood-based uncertainty measure. This step is load-bearing for the claim of approximating the posterior predictive, yet the paper provides no quantitative bound on the approximation error when the initial predictions deviate from the true conditional (e.g., under misspecification or high noise). A concrete error analysis or counterexample regime is required.

Authors: We acknowledge that the current manuscript does not provide a general quantitative bound on the approximation error for arbitrary misspecification or high noise. The theoretical analysis in §3 characterizes SSLA for the linear Gaussian case, where it recovers the exact posterior predictive distribution, and motivates the approach more generally via the self-supervised refitting construction. In the revised version we will add a new paragraph to §3 that derives a simple error bound in terms of the L2 deviation between the initial MAP predictions and the true conditional mean (under standard Lipschitz assumptions on the model), together with a concrete counterexample in a high-noise linear regression regime that illustrates when the bias becomes noticeable. This directly supplies the requested error analysis while preserving the modular and sampling-free nature of the method. revision: yes
Referee: [§5] §5 (Experiments): The abstract states outperformance in predictive calibration across simulated and real-world regression datasets, but the experimental design details (baselines, number of runs, error bars, statistical tests, and handling of self-prediction bias) are not summarized here; without these, the empirical support for the central claim cannot be fully assessed.

Authors: The abstract is necessarily brief and therefore omits experimental protocol details, but §5 of the manuscript fully specifies the baselines (classical Laplace, variational inference, and ensembles), the number of independent runs with reported variability, the calibration metrics and statistical tests employed, and the handling of self-prediction bias via training-only pseudo-label generation and strict held-out evaluation. To make these elements immediately accessible, we will insert a short “Experimental Protocol” paragraph at the beginning of §5 and add a one-sentence summary of the evaluation setup to the abstract. We believe this change will allow the empirical claims to be assessed without altering the reported results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed approximation

full rationale

The paper proposes SSLA as a heuristic approximation to the posterior predictive distribution by refitting a Laplace approximation on self-generated pseudo-labels drawn from the initial model's point predictions. This construction is explicitly presented as an approximation technique inspired by self-training, not as a first-principles derivation whose output is definitionally identical to its input. The central claim (improved predictive calibration over classical Laplace) is evaluated empirically across simulated and real regression datasets and is therefore falsifiable against external benchmarks. No equations or steps in the provided abstract reduce the uncertainty measure to a tautological fit on the same quantities by construction, nor does the argument rest on load-bearing self-citations or imported uniqueness theorems. The acknowledged modeling assumption (initial predictions are sufficiently reliable pseudo-labels) is stated as such rather than smuggled in as a proven result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the paper relies on standard Laplace approximation assumptions and introduces self-supervised refitting as the key mechanism. No explicit free parameters, invented entities, or detailed axioms are described.

axioms (2)

domain assumption Laplace approximation provides a valid local Gaussian fit to the posterior
Standard assumption invoked for the base method being extended
ad hoc to paper Self-predicted data likelihood reliably indicates predictive uncertainty
Core premise of the self-supervised refitting step

pith-pipeline@v0.9.0 · 5522 in / 1371 out tokens · 62813 ms · 2026-05-13T04:44:02.157733+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

logp(ˆyn+1|xn+1,D)≈˜ℓ(˜θ)−ℓ(ˆθ)+logπ(˜θ)−logπ(ˆθ)−½log|˜J(˜θ)|+½log|J(ˆθ)| (eq. 12); ASSLA drops prior term via O(n⁻¹) expansion
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

refitting on self-predicted data to quantify predictive uncertainty

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.