Wasserstein Contraction of Coordinate Ascent Variational Inference
Pith reviewed 2026-06-29 05:25 UTC · model grok-4.3
The pith
Coordinate ascent variational inference contracts in Wasserstein distance at fixed points when a transport-information inequality holds there.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a transport-information inequality at the fixed points and a functional smoothness condition, the coordinate ascent variational inference algorithm contracts in the Wasserstein distance. The result is general enough to cover smooth manifolds and certain non-smooth spaces, and it yields local convergence guarantees for the algorithm.
What carries the argument
The transport-information inequality at the fixed points of the coordinate ascent variational inference map, which bounds Wasserstein distance by a multiple of the KL divergence and thereby implies contraction of the coordinate-wise update operator.
Load-bearing premise
A transport-information inequality holds at the fixed points of the coordinate ascent variational inference algorithm.
What would settle it
A concrete variational inference problem in which the transport-information inequality fails at the fixed point yet successive CAVI iterates still fail to contract in Wasserstein distance.
Figures
read the original abstract
We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results are general and sharp, allow for local convergence guarantees, hold for general smooth manifolds, and also in some non-smooth spaces. We consider applications to Bayesian Gaussian Mixture Models, and high-dimensional Bayesian Probit Regression, and Logistic Regression with P\'olya-Gamma random variables (i.e. Jaakkola-Jordan's algorithm).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that coordinate ascent variational inference (CAVI) contracts in Wasserstein distance under a transport-information inequality at the fixed points together with a functional smoothness condition. The results are presented as general and sharp, yielding local convergence guarantees on smooth manifolds and some non-smooth spaces, with applications to Bayesian Gaussian mixture models, high-dimensional Bayesian probit regression, and logistic regression with Pólya-Gamma augmentation (Jaakkola-Jordan algorithm).
Significance. If the stated conditions hold, the work supplies local Wasserstein contraction guarantees for CAVI that are more general than typical Euclidean analyses and explicitly allow manifold settings. The conditional formulation is a strength when the inequality can be verified, as it separates the algorithmic contraction from model-specific functional analysis.
major comments (2)
- [§5 and §6] The central theorem (presumably Theorem 3.2 or equivalent) establishes contraction only when a transport-information inequality holds at the CAVI fixed points. In the applications to GMMs (§5) and Pólya-Gamma logistic regression (§6), this inequality is invoked but not shown to hold at the relevant fixed points of the variational family; without this verification the contraction claim does not activate for those models.
- [Abstract and §3] The functional smoothness condition required alongside the transport-information inequality is stated in the general theorem but its verification (or relaxation) for the non-smooth spaces mentioned in the abstract is not detailed, leaving the scope of the non-smooth extension unclear.
minor comments (1)
- Notation for the variational family and the Wasserstein metric should be introduced once with consistent symbols across the general theorem and the applications.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments correctly identify that our main theorem is conditional and that the applications require explicit verification of the hypotheses. We address each point below and will incorporate the suggested clarifications.
read point-by-point responses
-
Referee: [§5 and §6] The central theorem (presumably Theorem 3.2 or equivalent) establishes contraction only when a transport-information inequality holds at the CAVI fixed points. In the applications to GMMs (§5) and Pólya-Gamma logistic regression (§6), this inequality is invoked but not shown to hold at the relevant fixed points of the variational family; without this verification the contraction claim does not activate for those models.
Authors: We agree that the contraction guarantee is conditional on the transport-information inequality holding at the fixed points. The current manuscript invokes the inequality in the applications but does not supply the explicit verification at the relevant fixed points for either the GMM or the Pólya-Gamma logistic regression examples. In the revised version we will add explicit calculations (or references to known results under standard assumptions) confirming that the inequality holds at the CAVI fixed points for both models, thereby making the application of the theorem rigorous. revision: yes
-
Referee: [Abstract and §3] The functional smoothness condition required alongside the transport-information inequality is stated in the general theorem but its verification (or relaxation) for the non-smooth spaces mentioned in the abstract is not detailed, leaving the scope of the non-smooth extension unclear.
Authors: The abstract states that the results also hold in some non-smooth spaces. The functional smoothness condition is part of the theorem hypothesis, and the non-smooth claim refers to settings where the condition admits a suitable relaxation (e.g., via weak derivatives or discrete metrics). We acknowledge that §3 currently focuses on the smooth case and does not detail the non-smooth extension. In the revision we will expand §3 with a subsection providing concrete examples of non-smooth spaces and the corresponding relaxations of the smoothness assumption. revision: yes
Circularity Check
No circularity detected; contraction result is conditional on an external inequality assumption
full rationale
The paper's central result establishes local Wasserstein contraction of CAVI iterates conditional on a transport-information inequality holding at fixed points plus a smoothness condition. This is an implication, not a self-referential definition or fitted prediction. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are present in the abstract or described claims. The inequality is treated as an assumption to be verified separately for specific models, not derived within the paper, so the derivation chain does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Transport-information inequality at the fixed points
- domain assumption Functional smoothness condition
Reference graph
Works this paper leans on
-
[1]
(2009b))
Within the Bogachev–Kolesnikov hierarchy of geometric functional inequalities ‘of Gaus- sian type’ (Section 3.5 of Bogachev and Kolesnikov (2012)) TI inequalities lie in between Sobolev-type energy-entropy inequalities and transport(-entropy) inequalities, being quan- titatively weaker than the Logarithmic Sobolev inequality and quantitatively stronger th...
2012
-
[2]
the Poincaré inequality (Proposition 2.9.b of Guillin et al
Analogously to transportation-entropy inequalities, in the casep= 2, a TI inequality still implies an energy-entropy inequality ‘of exponential type’, i.e. the Poincaré inequality (Proposition 2.9.b of Guillin et al. (2009b)). Moreover, for log-concaveγ(orγwhich deviates from log-concavity sufficiently mildly), the TI inequality even yields a full Log- ar...
-
[3]
In terms of verifying TI inequalities ‘from scratch’, a rather crude but general tool is the method of Lyapunov conditions. Ifγis known a priori to satisfy a Poincaré inequality, then it is sufficient to exhibit a Lyapunov functionW≥1, a reference pointx0 ∈ X, and positive constantsb,cfor which LW≤b−cd(x, x 0)2 W. Under these conditions, a TI inequality h...
2017
-
[4]
For quantitative purposes, it often works well to study the metric contraction properties of the Langevin diffusion. In Wu (2009), it is established that granted e.g.1 hypocontractivity estimates of the form ∀x, x′ ∈ X,W 1 Pt (x,·), P t x′,· ≤ℓ(t)·d x, x′ withℓ : R+ →R + integrable, one can deduce a TI inequality forγwithp= 1(Corol- lary 2.2 of Wu (2009))...
2009
-
[5]
(2009b), and subsequently in Gao et al
Thus far, the main applications of TI inequalities have apparently been in establishing path-space concentration inequalities for additive functionals along trajectories of the Langevin diffusion, following the developments in, e.g., Theorem 4.1 of Guillin et al. (2009b), and subsequently in Gao et al. (2014). 1Actually, Wu (2009) makes the slightly more ...
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.