Robust volatility updates for Hierarchical Gaussian Filtering

Christoph Mathys; Lilian Aline Weber; Nace Mikus; Nicolas Legrand; Peter Thestrup Waade

arxiv: 2605.00966 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.NE· q-bio.NC· stat.ML

Robust volatility updates for Hierarchical Gaussian Filtering

Christoph Mathys , Nicolas Legrand , Peter Thestrup Waade , Nace Mikus , Lilian Aline Weber This is my paper

Pith reviewed 2026-05-09 19:13 UTC · model grok-4.3

classification 💻 cs.LG cs.NEq-bio.NCstat.ML

keywords Hierarchical Gaussian Filteringvolatility couplingvariational inferencequadratic approximationLambert W functionbelief updatingrobust updates

0 comments

The pith

Hierarchical Gaussian Filtering now updates volatility beliefs without producing impossible negative precisions by using an interpolated quadratic approximation to the variational energy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hierarchical Gaussian Filtering networks perform efficient belief updates across layers, but the original equations for volatility coupling can yield negative posterior precision, which halts the algorithm. The paper replaces the single quadratic expansion with an interpolation between one centered at the prior prediction and a second centered at a mode located in closed form by the Lambert W function. This change keeps the resulting one-step update equations well-defined for every combination of parameters and every size of prediction error. A sympathetic reader would care because the fix removes a practical obstacle to running stable hierarchical inference in models of learning under changing volatility.

Core claim

The central claim is that an interpolated quadratic approximation to the variational energy of volatility-coupled nodes—one expansion at the prior prediction and one at a second mode whose location is given in closed form by the Lambert W function—yields update equations for mean and precision that remain positive and track the true variational posterior even when prediction errors are large.

What carries the argument

The interpolated quadratic approximation to the variational energy for volatility-coupled nodes, formed by blending expansions at the prior prediction and at a Lambert-W-derived second mode.

Load-bearing premise

The interpolated quadratic approximation remains sufficiently close to the true variational energy that belief updates stay accurate without introducing substantial bias for large prediction errors.

What would settle it

A direct numerical comparison, for a volatility-coupled node and a range of large prediction errors, between the precision obtained from the new closed-form update and the precision obtained by numerically maximizing the exact variational energy; any case in which the new precision is negative or deviates substantially from the exact value would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.00966 by Christoph Mathys, Lilian Aline Weber, Nace Mikus, Nicolas Legrand, Peter Thestrup Waade.

**Figure 2.** Figure 2: The three components of the canonical variational energy [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Bimodal variational energy and posterior for [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Simulation 1: KL divergence between the normalized variational posterior and the Gaussian approximation [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Simulation 2: Under standard conditions ( [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Simulation 3: When confronted with regime changes under high meta-volatility ( [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Simulation 4: Parameter-space coverage when filtering the reference time series. Blue indicates parameter [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Hierarchical Gaussian Filtering (HGF) networks allow for efficient updating of posterior distributions (beliefs) about hidden states of an agent's environment. HGF parent nodes can target the mean or variance of their children. New information entering at input nodes leads to a cascade of belief updates across the network according to one-step update equations for each node's mean and precision (inverse variance). However, the original form of the update equations for variance-targeting parents(volatility coupling) can in some regions of parameter space lead to negative posterior precision, a logical impossibility which causes the updating algorithm to terminate with an error. In this report, we introduce a modified quadratic approximation to the variational energy of volatility-coupled nodes that avoids negative posterior precision. The key idea is to interpolate between two quadratic expansions of the variational energy: one at the prior prediction and one at a second mode whose location is obtained in closed form via the Lambert W function. The resulting update equations are robust across the entire parameter space and faithfully track the variational posterior even for large prediction errors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fixes the negative-precision crash in HGF volatility updates with a Lambert W interpolation that keeps the equations closed-form and stable by design.

read the letter

The core advance is a modified quadratic approximation for volatility-coupled nodes in Hierarchical Gaussian Filtering. The original one-step updates could produce negative posterior precision in parts of parameter space, which breaks the algorithm. They interpolate between the usual expansion at the prior prediction and a second point located with the real branch of the Lambert W function. This guarantees non-negative precision everywhere and supplies explicit new update rules for the mean and precision that still follow from the variational energy.

Referee Report

2 major / 0 minor

Summary. The paper addresses negative posterior precision in volatility-coupled parent nodes of Hierarchical Gaussian Filtering (HGF) networks. It replaces the original quadratic approximation to the variational energy with an interpolation between an expansion at the prior prediction and a second critical point located in closed form via the real branch of the Lambert W function. The resulting one-step update equations for mean and precision are claimed to remain well-defined across the full parameter space and to track the true variational posterior even under large prediction errors.

Significance. If the construction holds, it supplies a parameter-free, analytically tractable fix to a known numerical failure mode in HGF, preserving the model’s ability to perform hierarchical belief updating without ad-hoc clipping or termination. The explicit use of the Lambert W function to guarantee non-negative precision is a clear technical strength that keeps the method within the original variational framework.

major comments (2)

[Abstract] Abstract: the central claim that the interpolated updates 'faithfully track the variational posterior even for large prediction errors' is asserted without any error analysis, bound on the approximation residual, or numerical comparison against the exact variational energy; this verification is load-bearing for the robustness statement.
The manuscript provides no simulation results or empirical tests across regimes of large prediction error or extreme volatility parameters, leaving the practical performance of the closed-form rules unverified despite the theoretical construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive evaluation of the technical contribution and for identifying the need for explicit verification of the approximation quality. We address each major comment below and will incorporate additional numerical evidence in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the interpolated updates 'faithfully track the variational posterior even for large prediction errors' is asserted without any error analysis, bound on the approximation residual, or numerical comparison against the exact variational energy; this verification is load-bearing for the robustness statement.

Authors: We agree that the manuscript currently lacks a formal error bound or direct numerical comparison to the exact variational energy. The interpolation is constructed to match the variational energy exactly at the prior prediction and at the Lambert-W-derived critical point, with the quadratic form chosen to guarantee non-negative precision everywhere; this ensures the update remains well-defined. However, we do not supply a rigorous residual bound in the present version. In the revision we will add a dedicated numerical section that compares the closed-form updates against direct numerical maximization of the variational energy across a range of large prediction errors. revision: yes
Referee: The manuscript provides no simulation results or empirical tests across regimes of large prediction error or extreme volatility parameters, leaving the practical performance of the closed-form rules unverified despite the theoretical construction.

Authors: We acknowledge that the current manuscript is primarily theoretical and contains no simulation studies. The derivation focuses on obtaining closed-form, parameter-free updates that remain defined for all inputs. To address the concern, the revised version will include targeted simulations that (i) reproduce the failure of the original HGF updates under large volatility prediction errors, (ii) demonstrate that the new rules remain stable, and (iii) compare the resulting posterior means and precisions against both the original method (where it succeeds) and numerical optimization of the exact variational objective. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation starts from the variational energy of volatility-coupled nodes in HGF and constructs an interpolated quadratic approximation whose second expansion point is located in closed form by the real branch of the Lambert W function. The resulting one-step update rules for posterior mean and precision are obtained by direct differentiation and algebraic rearrangement of this modified energy; they are not obtained by fitting parameters to data and then relabeling the fit as a prediction, nor do they rely on self-citation of prior uniqueness theorems or ansatzes. Negative posterior precision is precluded by the interpolation construction itself, and the claim that the updates track the true variational posterior follows from the local accuracy of the quadratic pieces rather than from any definitional equivalence between input and output. No load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Builds on the standard variational inference setup of HGF without introducing new free parameters, axioms beyond domain assumptions, or invented entities; relies on existing math functions like Lambert W.

axioms (1)

domain assumption Standard variational approximation to the posterior in HGF networks
The method assumes the usual mean-field variational inference framework and one-step update structure of HGF.

pith-pipeline@v0.9.0 · 5498 in / 1069 out tokens · 27737 ms · 2026-05-09T19:13:57.048124+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Closed-form predictive coding via hierarchical Gaussian filters
cs.LG 2026-05 unverdicted novelty 6.0

Predictive coding is recast as deep hierarchical Gaussian filters to restore precision-weighted message passing, yielding closed-form inference and online precision learning that matches backpropagation speed on Fashi...

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 1 Pith paper

[1]

M., Gonnet, G

Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J., and Knuth, D. E. (1996). On the Lambert W function. Advances in Computational Mathematics, 5(1):329–359

work page 1996
[2]

O., Mathys, C., Weber, L

Diaconescu, A. O., Mathys, C., Weber, L. A. E., Daunizeau, J., Kasper, L., Lomakina, E. I., Fehr, E., and Stephan, K. E. (2014). Inferring on the intentions of others by hierarchical Bayesian learning.PLOS Computational Biology, 10(9):e1003810. Frässle, S., Aponte, E. A., Bollmann, S., Brodersen, K. H., Do, C. T., Harrison, O. K., Harrison, S. J., Heinzle, J.,

work page 2014
[3]

I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F

Iglesias, S., Kasper, L., Lomakina, E. I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F. H., Raman, S., Schöbi, D., Toussaint, B., Weber, L. A., Yao, Y ., and Stephan, K. E. (2021). TAPAS: An open-source software package for Translational Neuromodeling and Computational Psychiatry.Frontiers in Psychiatry, 12:680811

work page 2021
[4]

H., Kasper, L., Piccirelli, M., den Ouden, H

Iglesias, S., Mathys, C., Brodersen, K. H., Kasper, L., Piccirelli, M., den Ouden, H. E. M., and Stephan, K. E. (2013). Hierarchical prediction errors in midbrain and basal forebrain during sensory learning.Neuron, 80(2):519–530

work page 2013
[5]

P., Mathys, C., and Rees, G

Lawson, R. P., Mathys, C., and Rees, G. (2017). Adults with autism overestimate the volatility of the sensory environment.Nature Neuroscience, 20(9):1293–1299

work page 2017
[6]

(2012).Hierarchical Gaussian filtering - ETH E-Collection

Mathys, C. (2012).Hierarchical Gaussian filtering - ETH E-Collection. PhD, ETH Zurich

work page 2012
[7]

J., and Stephan, K

Mathys, C., Daunizeau, J., Friston, K. J., and Stephan, K. E. (2011). A Bayesian foundation for individual learning under uncertainty.Frontiers in Human Neuroscience, 5:39

work page 2011
[8]

I., Daunizeau, J., Iglesias, S., Brodersen, K

Mathys, C., Lomakina, E. I., Daunizeau, J., Iglesias, S., Brodersen, K. H., Friston, K. J., and Stephan, K. E. (2014). Uncertainty in perception and the Hierarchical Gaussian Filter.Frontiers in Human Neuroscience, 8:825

work page 2014
[9]

and Weber, L

Mathys, C. and Weber, L. (2020). Hierarchical Gaussian filtering of sufficient statistic time series for active inference. In Verbelen, T., Lanillos, P., Buckley, C. L., and De Boom, C., editors,Active Inference, pages 52–58. Springer International Publishing

work page 2020
[10]

Mikus, N., Lamm, C., and Mathys, C. (2024). Computational phenotyping of aberrant belief updating in individuals with schizotypal traits and schizophrenia.Biological Psychiatry, 0(0)

work page 2024
[11]

A., Supanat, Moon, Z., Müller, L., Timothy, fghzxm, karlwessel, kcin96, and Hatherly, M

Novosel, R., Vargas, S. A., Supanat, Moon, Z., Müller, L., Timothy, fghzxm, karlwessel, kcin96, and Hatherly, M. (2024). fonsp/Pluto.jl: v0.19.46

work page 2024
[12]

R., Mathys, C., and Corlett, P

Powers, A. R., Mathys, C., and Corlett, P. R. (2017). Pavlovian conditioning–induced hallucinations result from overweighting of perceptual priors.Science, 357(6351):596–600

work page 2017
[13]

T., Mikus, N., and Mathys, C

Waade, P. T., Mikus, N., and Mathys, C. (2021). Inferring in Circles: Active Inference in Continuous State Space Using Hierarchical Gaussian Filtering of Sufficient Statistics. InMachine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 810–818. Springer International Publishing

work page 2021
[14]

A., Waade, P

Weber, L. A., Waade, P. T., Legrand, N., Møller, A. H., Stephan, K. E., and Mathys, C. (2025). The generalized Hierarchical Gaussian Filter. arXiv:2305.10937 [cs, q-bio]. 6https://github.com/ComputationalPsychiatry/HierarchicalGaussianFiltering.jl 7https://github.com/ComputationalPsychiatry/pyhgf 8https://github.com/ComputationalPsychiatry/hgf-toolbox 17

work page arXiv 2025

[1] [1]

M., Gonnet, G

Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J., and Knuth, D. E. (1996). On the Lambert W function. Advances in Computational Mathematics, 5(1):329–359

work page 1996

[2] [2]

O., Mathys, C., Weber, L

Diaconescu, A. O., Mathys, C., Weber, L. A. E., Daunizeau, J., Kasper, L., Lomakina, E. I., Fehr, E., and Stephan, K. E. (2014). Inferring on the intentions of others by hierarchical Bayesian learning.PLOS Computational Biology, 10(9):e1003810. Frässle, S., Aponte, E. A., Bollmann, S., Brodersen, K. H., Do, C. T., Harrison, O. K., Harrison, S. J., Heinzle, J.,

work page 2014

[3] [3]

I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F

Iglesias, S., Kasper, L., Lomakina, E. I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F. H., Raman, S., Schöbi, D., Toussaint, B., Weber, L. A., Yao, Y ., and Stephan, K. E. (2021). TAPAS: An open-source software package for Translational Neuromodeling and Computational Psychiatry.Frontiers in Psychiatry, 12:680811

work page 2021

[4] [4]

H., Kasper, L., Piccirelli, M., den Ouden, H

Iglesias, S., Mathys, C., Brodersen, K. H., Kasper, L., Piccirelli, M., den Ouden, H. E. M., and Stephan, K. E. (2013). Hierarchical prediction errors in midbrain and basal forebrain during sensory learning.Neuron, 80(2):519–530

work page 2013

[5] [5]

P., Mathys, C., and Rees, G

Lawson, R. P., Mathys, C., and Rees, G. (2017). Adults with autism overestimate the volatility of the sensory environment.Nature Neuroscience, 20(9):1293–1299

work page 2017

[6] [6]

(2012).Hierarchical Gaussian filtering - ETH E-Collection

Mathys, C. (2012).Hierarchical Gaussian filtering - ETH E-Collection. PhD, ETH Zurich

work page 2012

[7] [7]

J., and Stephan, K

Mathys, C., Daunizeau, J., Friston, K. J., and Stephan, K. E. (2011). A Bayesian foundation for individual learning under uncertainty.Frontiers in Human Neuroscience, 5:39

work page 2011

[8] [8]

I., Daunizeau, J., Iglesias, S., Brodersen, K

Mathys, C., Lomakina, E. I., Daunizeau, J., Iglesias, S., Brodersen, K. H., Friston, K. J., and Stephan, K. E. (2014). Uncertainty in perception and the Hierarchical Gaussian Filter.Frontiers in Human Neuroscience, 8:825

work page 2014

[9] [9]

and Weber, L

Mathys, C. and Weber, L. (2020). Hierarchical Gaussian filtering of sufficient statistic time series for active inference. In Verbelen, T., Lanillos, P., Buckley, C. L., and De Boom, C., editors,Active Inference, pages 52–58. Springer International Publishing

work page 2020

[10] [10]

Mikus, N., Lamm, C., and Mathys, C. (2024). Computational phenotyping of aberrant belief updating in individuals with schizotypal traits and schizophrenia.Biological Psychiatry, 0(0)

work page 2024

[11] [11]

A., Supanat, Moon, Z., Müller, L., Timothy, fghzxm, karlwessel, kcin96, and Hatherly, M

Novosel, R., Vargas, S. A., Supanat, Moon, Z., Müller, L., Timothy, fghzxm, karlwessel, kcin96, and Hatherly, M. (2024). fonsp/Pluto.jl: v0.19.46

work page 2024

[12] [12]

R., Mathys, C., and Corlett, P

Powers, A. R., Mathys, C., and Corlett, P. R. (2017). Pavlovian conditioning–induced hallucinations result from overweighting of perceptual priors.Science, 357(6351):596–600

work page 2017

[13] [13]

T., Mikus, N., and Mathys, C

Waade, P. T., Mikus, N., and Mathys, C. (2021). Inferring in Circles: Active Inference in Continuous State Space Using Hierarchical Gaussian Filtering of Sufficient Statistics. InMachine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 810–818. Springer International Publishing

work page 2021

[14] [14]

A., Waade, P

Weber, L. A., Waade, P. T., Legrand, N., Møller, A. H., Stephan, K. E., and Mathys, C. (2025). The generalized Hierarchical Gaussian Filter. arXiv:2305.10937 [cs, q-bio]. 6https://github.com/ComputationalPsychiatry/HierarchicalGaussianFiltering.jl 7https://github.com/ComputationalPsychiatry/pyhgf 8https://github.com/ComputationalPsychiatry/hgf-toolbox 17

work page arXiv 2025