Don't Stop Me Yet: Sampling Loss Minima via Dissipative Riemannian Mechanics

Albert Kj{\o}ller Jacobsen; Georgios Arvanitidis; Johanna Marie Gegenfurtner; Leo Uhre Jakobsen

arxiv: 2605.15459 · v1 · pith:757O5WSInew · submitted 2026-05-14 · 💻 cs.LG · stat.ML

Don't Stop Me Yet: Sampling Loss Minima via Dissipative Riemannian Mechanics

Albert Kj{\o}ller Jacobsen , Leo Uhre Jakobsen , Johanna Marie Gegenfurtner , Georgios Arvanitidis This is my paper

Pith reviewed 2026-05-19 15:27 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords loss minima samplingreparameterization invariancedissipative dynamicsBayesian uncertaintyneural network optimizationRiemannian mechanicsconnected minima componentsdynamical sampling

0 comments

The pith

A new dynamical sampler called DiMS exactly targets the connected components of reparameterization-invariant minima in neural network losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern neural network loss minima are not isolated points but form connected components of solutions that are equivalent under reparameterization when evaluated on the training data. Existing sampling methods either spread across broader low-loss regions or stay trapped near single local minima, so they fail to isolate these exact equivalent sets. The proposed DiMS constructs a dynamical system on the parameter manifold that includes kinetic energy, a gravitational pull toward lower loss, and a friction term that dissipates excess energy until the trajectory settles precisely on the minimum level sets. Physically motivated hyperparameters control how widely the sampler explores different valleys within those sets. When applied to Bayesian uncertainty quantification, the resulting samples yield improved performance over earlier approaches.

Core claim

The minima of modern neural network loss functions typically form connected components of reparameterization invariant solutions on the training data. A dynamical system based on kinetic energy, subject to a gravitational pull and a friction term that dissipates energy, produces trajectories that are guaranteed to sample exactly from these minimum level sets rather than from larger low-loss regions or single valleys.

What carries the argument

Dissipative Riemannian dynamical system driven by kinetic energy, gravitational attraction to lower loss, and a friction term that removes energy until motion is confined to minimum level sets.

If this is right

DiMS produces samples that remain exactly on the minimum level sets instead of diffusing through larger low-loss regions.
The sampler can move between different minima valleys while respecting the reparameterization invariance constraint.
Hyperparameters with direct physical interpretations let users adjust the degree of exploration.
Uncertainty estimates derived from these samples outperform those obtained from previous local or diffusive methods in Bayesian inference tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same energy-dissipation construction could be adapted to sample symmetric solution sets in other high-dimensional non-convex problems outside neural networks.
Exact sampling of reparameterization components may help isolate which directions in parameter space truly affect generalization versus those absorbed by symmetry.
Running the sampler from multiple random initializations would produce an empirical map of distinct minima components that could be compared against theoretical predictions of connectivity.

Load-bearing premise

Minima of modern neural network loss functions typically form connected components of reparameterization invariant solutions on the training data.

What would settle it

Generate samples with DiMS and check whether every sample produces identical predictions on the training set and achieves exactly the minimum loss value; any deviation would show the sampler is not confined to the claimed level sets.

Figures

Figures reproduced from arXiv: 2605.15459 by Albert Kj{\o}ller Jacobsen, Georgios Arvanitidis, Johanna Marie Gegenfurtner, Leo Uhre Jakobsen.

**Figure 1.** Figure 1: Two trajectories through parameter space via different dynamical systems. The sphere constitutes a submanifold of parameters that minimize L(θ) = PK i=1 θ 2 i − 1. While a geodesic on the loss surface ( ) is the straightest path, it never converges to a minimum solution, even when starting from one. Our modified dynamics ( ) ensure energy dissipation which leads the particle to stop at some minimum sol… view at source ↗

**Figure 2.** Figure 2: We fit a neural network to N = 7 training points and show function-space samples from two geometry-aware sampling schemes formulated as continuous dynamical systems. The Riemannian Laplace approximation (left) explores low-loss regions but does not enforce interpolating the training data. Our sampler DIMS (center) leverages an alternative formulation of the system and guarantees convergence to minimum trai… view at source ↗

**Figure 3.** Figure 3: The loss surface viewed as a Riemannian manifold [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Top: With the white initial velocity, the systems affected by a gravity pull remain closer to the minimum level set, yet without dissipation the particle never converges. With dissipation via a friction force, convergence is guaranteed, and our speed-dependent dissipation function allows traveling further than constant friction. Bottom: With the black initial velocity, curves affected by gravity oscillat… view at source ↗

**Figure 5.** Figure 5: The dynamics of the proposed improved dynamical system. Depending on the initial velocity sample v˜ ∼ q(v˜), DIMS is capable of sampling distinct minimum level sets by dissipating energy particularly when moving proportional to the gradient, and eventually stop. In contrast the geodesic path is unconstrained and requires defining a stop time. We remark that the dissipation strength directly depends on t… view at source ↗

**Figure 6.** Figure 6: Function space samples obtained by DIMS give high OOD estimates without breaking the fit on training data, which RLA does not guarantee. Even when initialized from a suboptimal position, DIMS provides well-behaved function space samples. Red and white dots are ID and OOD data, respectively, and the black line is the function induced by the initial position parameters. Uncertainty ranges are based on standa… view at source ↗

**Figure 7.** Figure 7: 2D binary classification on the banana dataset. The trained [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

The minima of modern neural network loss functions are typically not isolated, rather they form connected components of reparameterization invariant solutions on the training data. Analytically characterizing these solutions is a hard problem, but sampling approaches are feasible. By construction, existing methods either spread over low-loss regions, and thus do not sample reparameterization invariant solutions exactly, or are inherently local, which limits exploration of other minima valleys. We propose sampling such reparameterization invariant models using a dynamical system based on kinetic energy, subject to a gravitational pull and a friction term that dissipates energy from the system. Our proposed sampler, DiMS, is guaranteed to sample exactly from the minimum level sets and depends on physically motivated hyperparameters which allows control over the exploration capabilities of the sampler. We consider uncertainty quantification in Bayesian inference as the motivating problem and observe improved performance compared to previously proposed approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiMS claims exact sampling from NN loss minima level sets via dissipative Riemannian dynamics, but the guarantee depends on an unverified assumption that those minima form flat, connected reparameterization-invariant components.

read the letter

The main point is that this paper introduces DiMS, a sampler built from kinetic energy plus a gravitational pull toward lower loss and a friction term on a Riemannian manifold. The claim is that the dissipative dynamics keep the sampler exactly on the minimum level sets rather than spreading across broader low-loss areas or getting stuck locally. They motivate it with Bayesian uncertainty quantification and report better performance than earlier approaches on that task. The hyperparameters are framed as physically motivated, which gives a way to dial exploration up or down without arbitrary tuning knobs. That construction is the clearest new element here. The motivation around reparameterization-invariant solutions in neural net losses is laid out plainly, and the physical analogy helps make the dynamics feel intuitive. On the downside, the exact-sampling guarantee only holds if the loss is constant across the connected components of those invariant solutions and the dynamics cannot escape them. The abstract treats this flat connected-component structure as typical for modern networks, yet the details provided do not include a full derivation or targeted experiments that confirm the loss really stays flat inside those components rather than just being low. If reparameterization orbits do not exhaust the minima or if small variations exist inside them, the sampler would target a larger set than advertised. The stress-test note captures this accurately. This is aimed at people already working on manifold MCMC or posterior sampling for deep models. A reader who follows Hamiltonian or Riemannian Monte Carlo variants could extract the dynamical system idea even if the guarantees need more checking. It is worth sending to peer review so the math behind the invariant sets and the empirical comparisons can be examined directly.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes DiMS, a sampler based on dissipative Riemannian dynamics that incorporates kinetic energy, a gravitational pull toward lower loss values, and a friction term. It claims that neural network loss minima typically form connected components of reparameterization-invariant solutions on the training data, and that the proposed dynamics are guaranteed to sample exactly from the corresponding minimum level sets. The approach is motivated by uncertainty quantification in Bayesian inference and reports improved performance over prior samplers.

Significance. If the exact-sampling guarantee can be established and the structural assumption on loss landscapes holds, the work offers a physically interpretable framework for targeting flat, reparameterization-invariant minima with controllable exploration via hyperparameters. This could strengthen Bayesian methods by avoiding both overly diffuse low-loss sampling and overly local exploration, addressing a recognized limitation in existing approaches.

major comments (3)

[Abstract] Abstract: the central claim that DiMS 'is guaranteed to sample exactly from the minimum level sets' is presented as following directly from the construction of the dynamical system, yet no theorem, invariant-measure derivation, or even proof sketch is supplied. The guarantee is load-bearing and requires a formal statement (presumably in the section defining the Riemannian dynamics and friction term) showing that the invariant support coincides precisely with the minimum level sets under the stated constancy and connectedness assumptions.
[Abstract] Abstract: the guarantee can hold only if the loss is exactly constant on the connected components of reparameterization-invariant solutions and the dynamics cannot escape them. The manuscript states this structural property as typical for modern networks but supplies no theorem, lemma, or experiment confirming constancy versus merely low-loss connectivity; if the loss varies inside putative components or reparameterization orbits do not exhaust the minima, the sampler targets a strictly larger set than claimed.
[Abstract] The abstract asserts that the method 'depends on physically motivated hyperparameters which allows control over the exploration capabilities,' yet provides no quantitative analysis or ablation showing how the friction and gravitational parameters map to mixing time or coverage of distinct minima valleys. This control is advertised as a practical advantage and should be demonstrated with concrete scaling or sensitivity results.

minor comments (2)

[Abstract] The abstract would be clearer if it briefly indicated the specific Riemannian metric employed (e.g., whether it is the Fisher information metric or a simpler choice) and the precise form of the friction term.
Notation for the kinetic energy and gravitational potential should be introduced consistently when the dynamical system is first defined, to avoid ambiguity when later referring to energy dissipation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating planned revisions to strengthen the formal and empirical support for our claims while remaining faithful to the manuscript's content and assumptions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that DiMS 'is guaranteed to sample exactly from the minimum level sets' is presented as following directly from the construction of the dynamical system, yet no theorem, invariant-measure derivation, or even proof sketch is supplied. The guarantee is load-bearing and requires a formal statement (presumably in the section defining the Riemannian dynamics and friction term) showing that the invariant support coincides precisely with the minimum level sets under the stated constancy and connectedness assumptions.

Authors: We agree that the exact-sampling guarantee requires an explicit formal statement. In the revised manuscript we will insert a theorem in the section on dissipative Riemannian dynamics. The theorem will establish that, under the assumptions of constant loss on connected components of reparameterization-invariant solutions and that the dynamics remain confined to those components, the unique invariant measure is supported precisely on the minimum level sets. A proof sketch will be supplied that combines the Hamiltonian structure of the kinetic term, the conservative gravitational force derived from the loss, and the dissipative friction term that drives the system to the level sets. revision: yes
Referee: [Abstract] Abstract: the guarantee can hold only if the loss is exactly constant on the connected components of reparameterization-invariant solutions and the dynamics cannot escape them. The manuscript states this structural property as typical for modern networks but supplies no theorem, lemma, or experiment confirming constancy versus merely low-loss connectivity; if the loss varies inside putative components or reparameterization orbits do not exhaust the minima, the sampler targets a strictly larger set than claimed.

Authors: The constancy assumption follows from the reparameterization invariance of the training loss for the architectures considered. We will add a short lemma that formally states this invariance property and its implication for level-set constancy. We will also include a modest empirical check on a toy network demonstrating near-constant loss along reparameterization orbits; we note that exhaustive verification on large-scale models remains an open empirical question and will be listed as a modeling assumption with discussion of possible deviations. revision: partial
Referee: [Abstract] The abstract asserts that the method 'depends on physically motivated hyperparameters which allows control over the exploration capabilities,' yet provides no quantitative analysis or ablation showing how the friction and gravitational parameters map to mixing time or coverage of distinct minima valleys. This control is advertised as a practical advantage and should be demonstrated with concrete scaling or sensitivity results.

Authors: We accept that the abstract claim on controllable exploration should be supported by quantitative evidence. The revised manuscript will contain a new ablation subsection that systematically varies the friction coefficient and gravitational strength. We will report estimated mixing times (via integrated autocorrelation) and coverage of distinct minima (via the number of unique basins visited across independent runs), together with sensitivity plots that illustrate the trade-off between exploration and convergence speed. revision: yes

Circularity Check

0 steps flagged

No circularity: guarantee follows from explicit dynamical system construction under stated landscape assumption

full rationale

The paper states the structural property of NN loss minima as a typical empirical fact (connected reparameterization-invariant components) and then constructs DiMS dynamics (kinetic energy + gravitational term + friction) whose invariant sets are exactly those level sets by design of the vector field. The exact-sampling guarantee is therefore a direct mathematical consequence of the ODE construction once the landscape assumption is granted; it is not obtained by fitting parameters to data, by renaming an input, or by a self-citation chain that itself lacks independent verification. No equation or claim reduces the target distribution to the fitted inputs by definition.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, proofs, and experimental details unavailable, so ledger entries are limited to statements explicit in the abstract.

free parameters (1)

physically motivated hyperparameters
Control exploration capabilities; specific values or fitting procedure not stated in abstract.

axioms (1)

domain assumption Minima of modern neural network loss functions form connected components of reparameterization invariant solutions on the training data.
Stated as typical behavior in the opening sentence of the abstract.

pith-pipeline@v0.9.0 · 5694 in / 1244 out tokens · 75767 ms · 2026-05-19T15:27:50.274768+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

¨α=−(˙αᵀHL(α)˙α+κ)·gradL(α)−η(t)·˙α with η(t)=η₀∥˙α∥√(1+∥∇L∥²cos²ω)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 2 internal anchors

[1]

arXiv preprint arXiv:2510.26266 , year=

A Likely Geometry of Generative Models , author=. arXiv preprint arXiv:2510.26266 , year=

work page arXiv
[2]

Neural Information Processing Systems (NeurIPS) , year=

Roy, Hrittik and Miani, Marco and Ek, Carl Henrik and Hennig, Philipp and Pf. Neural Information Processing Systems (NeurIPS) , year=

work page
[3]

Neural Information Processing Systems (NeurIPS) , year=

Bergamin, Federico and Moreno-Mu. Neural Information Processing Systems (NeurIPS) , year=

work page
[4]

Yu, Hanlin and Hartmann, Marcelo and Sanchez, Bernardo Williams Moreno and Girolami, Mark and Klami, Arto , booktitle=

work page
[5]

Reichlin, Alfredo and Vasco, Miguel and Kragic Jensfelt, Danica , journal=

work page
[6]

Roch, Hendrik and Shen, Chun , journal=

work page
[7]

Li, Yiming and Qiu, Jiacheng and Calinon, Sylvain , journal=

work page
[8]

Di Sipio, Riccardo and Diaz-Rodriguez, Jairo and Serrano, Luis , journal=

work page
[9]

Hoffman, Matthew D and Gelman, Andrew and others , journal=

work page
[10]

Welling, Max and Teh, Yee W , booktitle=

work page
[11]

Ma, Yi-An and Chen, Yuansi and Jin, Chi and Flammarion, Nicolas and Jordan, Michael I , journal=

work page
[12]

International Conference on Machine Learning (ICML) , year=

Papamarkou, Theodore and Skoularidou, Maria and Palla, Konstantina and Aitchison, Laurence and Arbel, Julyan and Dunson, David and Filippone, Maurizio and Fortuin, Vincent and Hennig, Philipp and Hern. International Conference on Machine Learning (ICML) , year=

work page
[13]

Kristiadi, Agustinus and Eschenhagen, Runa and Hennig, Philipp , journal=

work page
[14]

International Conference on Geometric Science of Information (GSI) , year=

Da Costa, Natha. International Conference on Geometric Science of Information (GSI) , year=

work page
[15]

Minguzzi, Ettore , journal=

work page
[16]

Cline, Douglas , year=

work page
[17]

Neural Information Processing Systems (NeurIPS) , year=

Kr. Neural Information Processing Systems (NeurIPS) , year=

work page
[18]

Kunstner, Frederik and Hennig, Philipp and Balles, Lukas , journal=

work page
[19]

International Conference on Machine Learning (ICML) , year=

Immer, Alexander and Bauer, Matthias and Fortuin, Vincent and R. International Conference on Machine Learning (ICML) , year=

work page
[20]

Daxberger, Erik and Kristiadi, Agustinus and Immer, Alexander and Eschenhagen, Runa and Bauer, Matthias and Hennig, Philipp , journal=

work page
[21]

Immer, Alexander and Korzepa, Maciej and Bauer, Matthias , booktitle=

work page
[22]

International conference on artificial intelligence and statistics , pages=

Do Bayesian neural networks need to be fully stochastic? , author=. International conference on artificial intelligence and statistics , pages=. 2023 , organization=

work page 2023
[23]

Advances in Neural Information Processing Systems , volume=

Should we learn most likely functions or parameters? , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

arXiv preprint arXiv:2602.00199 , year=

Reducing Memorisation in Generative Models via Riemannian Bayesian Inference , author=. arXiv preprint arXiv:2602.00199 , year=

work page arXiv
[25]

2018 , publisher=

Introduction to Riemannian manifolds , author=. 2018 , publisher=

work page 2018
[26]

1992 , publisher=

Riemannian geometry , author=. 1992 , publisher=

work page 1992
[27]

2007 , publisher=

Finsler-Lagrange geometry: Applications to dynamical systems , author=. 2007 , publisher=

work page 2007
[28]

Kovachki, Nikola B and Stuart, Andrew M , journal=

work page
[29]

arXiv preprint arXiv:2510.23684 , year=

Fadel, Samuel G and Roy, Hrittik and Kr. arXiv preprint arXiv:2510.23684 , year=

work page arXiv
[30]

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , year =

Garipov, Timur and Izmailov, Pavel and Podoprikhin, Dmitrii and Vetrov, Dmitry P and Wilson, Andrew G , booktitle =. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , year =

work page
[31]

International Conference on Machine Learning (ICML) , year =

Essentially No Barriers in Neural Network Energy Landscape , author =. International Conference on Machine Learning (ICML) , year =

work page
[32]

International Conference on Machine Learning (ICML) , year =

Sharp Minima Can Generalize For Deep Nets , author =. International Conference on Machine Learning (ICML) , year =

work page
[33]

Neural Information Processing Systems (NeurIPS) , year =

Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom , title=. Neural Information Processing Systems (NeurIPS) , year =

work page
[34]

Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-aware minimization for efficiently improving generalization , author=. arXiv preprint arXiv:2010.01412 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010
[35]

Advances in neural information processing systems , volume=

Implicit bias of gradient descent on linear convolutional networks , author=. Advances in neural information processing systems , volume=

work page
[36]

MacKay, David JC , journal=

work page
[37]

Neural computation , volume=

Flat minima , author=. Neural computation , volume=. 1997 , publisher=

work page 1997
[38]

Chen, Tianqi and Fox, Emily and Guestrin, Carlos , booktitle=

work page
[39]

Journal of the Royal Statistical Society Series B: Statistical Methodology , year=

Riemann manifold langevin and hamiltonian monte carlo methods , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , year=

work page
[40]

Neal, Radford M , journal=

work page
[41]

International Conference on Learning Representations , year=

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , author=. International Conference on Learning Representations , year=

work page
[42]

The role of permutation invariance in linear mode connectivity of neural networks.arXiv preprint arXiv:2110.06296,

The role of permutation invariance in linear mode connectivity of neural networks , author=. arXiv preprint arXiv:2110.06296 , year=

work page arXiv
[43]

Zhao, Bo and Dehmamy, Nima and Walters, Robin and Yu, Rose , booktitle =

work page
[44]

1964 , publisher=

Polyak, Boris T , journal=. 1964 , publisher=

work page 1964
[45]

Nesterov, Yurii , booktitle=

work page
[46]

Averaging Weights Leads to Wider Optima and Better Generalization

Averaging weights leads to wider optima and better generalization , author=. arXiv preprint arXiv:1803.05407 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Advances in neural information processing systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in neural information processing systems , volume=

work page
[48]

Su, Weijie and Boyd, Stephen and Candes, Emmanuel J , journal=

work page
[49]

Maddox, Wesley J and Izmailov, Pavel and Garipov, Timur and Vetrov, Dmitry P and Wilson, Andrew Gordon , journal=

work page
[50]

arXiv preprint arXiv:2512.24381 , year=

Tubular Riemannian Laplace Approximations for Bayesian Neural Networks , author=. arXiv preprint arXiv:2512.24381 , year=

work page arXiv
[51]

Lan, Shiwei and Stathopoulos, Vassilios and Shahbaba, Babak and Girolami, Mark , journal=

work page
[52]

Advances in neural information processing systems , volume=

Sparse Gaussian processes using pseudo-inputs , author=. Advances in neural information processing systems , volume=

work page
[53]

LeCun, Yann and Boser, Bernhard and Denker, John S and Henderson, Donnie and Howard, Richard E and Hubbard, Wayne and Jackel, Lawrence D , journal=

work page
[54]

2002 , publisher=

Nonlinear systems , author=. 2002 , publisher=

work page 2002
[55]

Chen, Ricky TQ and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David K , journal=

work page
[56]

and Prince, Peter J

Dormand, John R. and Prince, Peter J. , journal=. 1980 , publisher=

work page 1980

[1] [1]

arXiv preprint arXiv:2510.26266 , year=

A Likely Geometry of Generative Models , author=. arXiv preprint arXiv:2510.26266 , year=

work page arXiv

[2] [2]

Neural Information Processing Systems (NeurIPS) , year=

Roy, Hrittik and Miani, Marco and Ek, Carl Henrik and Hennig, Philipp and Pf. Neural Information Processing Systems (NeurIPS) , year=

work page

[3] [3]

Neural Information Processing Systems (NeurIPS) , year=

Bergamin, Federico and Moreno-Mu. Neural Information Processing Systems (NeurIPS) , year=

work page

[4] [4]

Yu, Hanlin and Hartmann, Marcelo and Sanchez, Bernardo Williams Moreno and Girolami, Mark and Klami, Arto , booktitle=

work page

[5] [5]

Reichlin, Alfredo and Vasco, Miguel and Kragic Jensfelt, Danica , journal=

work page

[6] [6]

Roch, Hendrik and Shen, Chun , journal=

work page

[7] [7]

Li, Yiming and Qiu, Jiacheng and Calinon, Sylvain , journal=

work page

[8] [8]

Di Sipio, Riccardo and Diaz-Rodriguez, Jairo and Serrano, Luis , journal=

work page

[9] [9]

Hoffman, Matthew D and Gelman, Andrew and others , journal=

work page

[10] [10]

Welling, Max and Teh, Yee W , booktitle=

work page

[11] [11]

Ma, Yi-An and Chen, Yuansi and Jin, Chi and Flammarion, Nicolas and Jordan, Michael I , journal=

work page

[12] [12]

International Conference on Machine Learning (ICML) , year=

Papamarkou, Theodore and Skoularidou, Maria and Palla, Konstantina and Aitchison, Laurence and Arbel, Julyan and Dunson, David and Filippone, Maurizio and Fortuin, Vincent and Hennig, Philipp and Hern. International Conference on Machine Learning (ICML) , year=

work page

[13] [13]

Kristiadi, Agustinus and Eschenhagen, Runa and Hennig, Philipp , journal=

work page

[14] [14]

International Conference on Geometric Science of Information (GSI) , year=

Da Costa, Natha. International Conference on Geometric Science of Information (GSI) , year=

work page

[15] [15]

Minguzzi, Ettore , journal=

work page

[16] [16]

Cline, Douglas , year=

work page

[17] [17]

Neural Information Processing Systems (NeurIPS) , year=

Kr. Neural Information Processing Systems (NeurIPS) , year=

work page

[18] [18]

Kunstner, Frederik and Hennig, Philipp and Balles, Lukas , journal=

work page

[19] [19]

International Conference on Machine Learning (ICML) , year=

Immer, Alexander and Bauer, Matthias and Fortuin, Vincent and R. International Conference on Machine Learning (ICML) , year=

work page

[20] [20]

Daxberger, Erik and Kristiadi, Agustinus and Immer, Alexander and Eschenhagen, Runa and Bauer, Matthias and Hennig, Philipp , journal=

work page

[21] [21]

Immer, Alexander and Korzepa, Maciej and Bauer, Matthias , booktitle=

work page

[22] [22]

International conference on artificial intelligence and statistics , pages=

Do Bayesian neural networks need to be fully stochastic? , author=. International conference on artificial intelligence and statistics , pages=. 2023 , organization=

work page 2023

[23] [23]

Advances in Neural Information Processing Systems , volume=

Should we learn most likely functions or parameters? , author=. Advances in Neural Information Processing Systems , volume=

work page

[24] [24]

arXiv preprint arXiv:2602.00199 , year=

Reducing Memorisation in Generative Models via Riemannian Bayesian Inference , author=. arXiv preprint arXiv:2602.00199 , year=

work page arXiv

[25] [25]

2018 , publisher=

Introduction to Riemannian manifolds , author=. 2018 , publisher=

work page 2018

[26] [26]

1992 , publisher=

Riemannian geometry , author=. 1992 , publisher=

work page 1992

[27] [27]

2007 , publisher=

Finsler-Lagrange geometry: Applications to dynamical systems , author=. 2007 , publisher=

work page 2007

[28] [28]

Kovachki, Nikola B and Stuart, Andrew M , journal=

work page

[29] [29]

arXiv preprint arXiv:2510.23684 , year=

Fadel, Samuel G and Roy, Hrittik and Kr. arXiv preprint arXiv:2510.23684 , year=

work page arXiv

[30] [30]

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , year =

Garipov, Timur and Izmailov, Pavel and Podoprikhin, Dmitrii and Vetrov, Dmitry P and Wilson, Andrew G , booktitle =. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , year =

work page

[31] [31]

International Conference on Machine Learning (ICML) , year =

Essentially No Barriers in Neural Network Energy Landscape , author =. International Conference on Machine Learning (ICML) , year =

work page

[32] [32]

International Conference on Machine Learning (ICML) , year =

Sharp Minima Can Generalize For Deep Nets , author =. International Conference on Machine Learning (ICML) , year =

work page

[33] [33]

Neural Information Processing Systems (NeurIPS) , year =

Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom , title=. Neural Information Processing Systems (NeurIPS) , year =

work page

[34] [34]

Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-aware minimization for efficiently improving generalization , author=. arXiv preprint arXiv:2010.01412 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010

[35] [35]

Advances in neural information processing systems , volume=

Implicit bias of gradient descent on linear convolutional networks , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

MacKay, David JC , journal=

work page

[37] [37]

Neural computation , volume=

Flat minima , author=. Neural computation , volume=. 1997 , publisher=

work page 1997

[38] [38]

Chen, Tianqi and Fox, Emily and Guestrin, Carlos , booktitle=

work page

[39] [39]

Journal of the Royal Statistical Society Series B: Statistical Methodology , year=

Riemann manifold langevin and hamiltonian monte carlo methods , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , year=

work page

[40] [40]

Neal, Radford M , journal=

work page

[41] [41]

International Conference on Learning Representations , year=

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , author=. International Conference on Learning Representations , year=

work page

[42] [42]

The role of permutation invariance in linear mode connectivity of neural networks.arXiv preprint arXiv:2110.06296,

The role of permutation invariance in linear mode connectivity of neural networks , author=. arXiv preprint arXiv:2110.06296 , year=

work page arXiv

[43] [43]

Zhao, Bo and Dehmamy, Nima and Walters, Robin and Yu, Rose , booktitle =

work page

[44] [44]

1964 , publisher=

Polyak, Boris T , journal=. 1964 , publisher=

work page 1964

[45] [45]

Nesterov, Yurii , booktitle=

work page

[46] [46]

Averaging Weights Leads to Wider Optima and Better Generalization

Averaging weights leads to wider optima and better generalization , author=. arXiv preprint arXiv:1803.05407 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Advances in neural information processing systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in neural information processing systems , volume=

work page

[48] [48]

Su, Weijie and Boyd, Stephen and Candes, Emmanuel J , journal=

work page

[49] [49]

Maddox, Wesley J and Izmailov, Pavel and Garipov, Timur and Vetrov, Dmitry P and Wilson, Andrew Gordon , journal=

work page

[50] [50]

arXiv preprint arXiv:2512.24381 , year=

Tubular Riemannian Laplace Approximations for Bayesian Neural Networks , author=. arXiv preprint arXiv:2512.24381 , year=

work page arXiv

[51] [51]

Lan, Shiwei and Stathopoulos, Vassilios and Shahbaba, Babak and Girolami, Mark , journal=

work page

[52] [52]

Advances in neural information processing systems , volume=

Sparse Gaussian processes using pseudo-inputs , author=. Advances in neural information processing systems , volume=

work page

[53] [53]

LeCun, Yann and Boser, Bernhard and Denker, John S and Henderson, Donnie and Howard, Richard E and Hubbard, Wayne and Jackel, Lawrence D , journal=

work page

[54] [54]

2002 , publisher=

Nonlinear systems , author=. 2002 , publisher=

work page 2002

[55] [55]

Chen, Ricky TQ and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David K , journal=

work page

[56] [56]

and Prince, Peter J

Dormand, John R. and Prince, Peter J. , journal=. 1980 , publisher=

work page 1980