Wasserstein Contraction of Coordinate Ascent Variational Inference

Adrien Corenflos; Rocco Caprio; Sam Power

arxiv: 2605.30253 · v2 · pith:NZNR6BOBnew · submitted 2026-05-28 · 📊 stat.ML · cs.LG· math.FA· math.OC· math.PR· stat.CO

Wasserstein Contraction of Coordinate Ascent Variational Inference

Rocco Caprio , Adrien Corenflos , Sam Power This is my paper

Pith reviewed 2026-06-29 05:25 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.FAmath.OCmath.PRstat.CO

keywords variational inferencecoordinate ascentWasserstein distancetransport-information inequalityGaussian mixture modelsprobit regressionlogistic regressioncontraction

0 comments

The pith

Coordinate ascent variational inference contracts in Wasserstein distance at fixed points when a transport-information inequality holds there.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the coordinate ascent variational inference algorithm produces iterates whose Wasserstein distance to the fixed points decreases under a transport-information inequality at those points combined with a smoothness condition on the objective. This contraction supplies local convergence guarantees and applies on general smooth manifolds as well as some non-smooth spaces. The result is demonstrated on Bayesian Gaussian mixture models, high-dimensional Bayesian probit regression, and logistic regression that uses Pólya-Gamma auxiliary variables.

Core claim

Under a transport-information inequality at the fixed points and a functional smoothness condition, the coordinate ascent variational inference algorithm contracts in the Wasserstein distance. The result is general enough to cover smooth manifolds and certain non-smooth spaces, and it yields local convergence guarantees for the algorithm.

What carries the argument

The transport-information inequality at the fixed points of the coordinate ascent variational inference map, which bounds Wasserstein distance by a multiple of the KL divergence and thereby implies contraction of the coordinate-wise update operator.

Load-bearing premise

A transport-information inequality holds at the fixed points of the coordinate ascent variational inference algorithm.

What would settle it

A concrete variational inference problem in which the transport-information inequality fails at the fixed point yet successive CAVI iterates still fail to contract in Wasserstein distance.

Figures

Figures reproduced from arXiv: 2605.30253 by Adrien Corenflos, Rocco Caprio, Sam Power.

read the original abstract

We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results are general and sharp, allow for local convergence guarantees, hold for general smooth manifolds, and also in some non-smooth spaces. We consider applications to Bayesian Gaussian Mixture Models, and high-dimensional Bayesian Probit Regression, and Logistic Regression with P\'olya-Gamma random variables (i.e. Jaakkola-Jordan's algorithm).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAVI gets a conditional Wasserstein contraction result, but the transport-information inequality needs separate verification for the claimed applications.

read the letter

The main result is a theorem showing CAVI contracts in Wasserstein distance under a transport-information inequality at the fixed points plus a functional smoothness condition. This is new and covers general smooth manifolds plus some non-smooth spaces, with local guarantees.

The paper does a clean job stating the assumptions and framing the result as a general template that could apply to several models, including the Gaussian mixture and Pólya-Gamma logistic regression cases.

The soft spot is that the inequality is assumed rather than shown to hold at the fixed points for those applications. The abstract and results treat it as a hypothesis that must be checked separately, so the contraction does not automatically follow for the examples without extra work. This keeps the practical payoff conditional.

The math looks formally grounded with clear assumptions and no obvious circularity. The citation pattern is standard.

This is for researchers focused on convergence analysis of variational methods. Readers interested in general properties of CAVI would get value from the framework, but those wanting unconditional guarantees for the listed models would need follow-up verification.

I would send it to peer review for a closer look at the proofs and the restrictiveness of the conditions.

Referee Report

2 major / 1 minor

Summary. The paper claims that coordinate ascent variational inference (CAVI) contracts in Wasserstein distance under a transport-information inequality at the fixed points together with a functional smoothness condition. The results are presented as general and sharp, yielding local convergence guarantees on smooth manifolds and some non-smooth spaces, with applications to Bayesian Gaussian mixture models, high-dimensional Bayesian probit regression, and logistic regression with Pólya-Gamma augmentation (Jaakkola-Jordan algorithm).

Significance. If the stated conditions hold, the work supplies local Wasserstein contraction guarantees for CAVI that are more general than typical Euclidean analyses and explicitly allow manifold settings. The conditional formulation is a strength when the inequality can be verified, as it separates the algorithmic contraction from model-specific functional analysis.

major comments (2)

[§5 and §6] The central theorem (presumably Theorem 3.2 or equivalent) establishes contraction only when a transport-information inequality holds at the CAVI fixed points. In the applications to GMMs (§5) and Pólya-Gamma logistic regression (§6), this inequality is invoked but not shown to hold at the relevant fixed points of the variational family; without this verification the contraction claim does not activate for those models.
[Abstract and §3] The functional smoothness condition required alongside the transport-information inequality is stated in the general theorem but its verification (or relaxation) for the non-smooth spaces mentioned in the abstract is not detailed, leaving the scope of the non-smooth extension unclear.

minor comments (1)

Notation for the variational family and the Wasserstein metric should be introduced once with consistent symbols across the general theorem and the applications.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify that our main theorem is conditional and that the applications require explicit verification of the hypotheses. We address each point below and will incorporate the suggested clarifications.

read point-by-point responses

Referee: [§5 and §6] The central theorem (presumably Theorem 3.2 or equivalent) establishes contraction only when a transport-information inequality holds at the CAVI fixed points. In the applications to GMMs (§5) and Pólya-Gamma logistic regression (§6), this inequality is invoked but not shown to hold at the relevant fixed points of the variational family; without this verification the contraction claim does not activate for those models.

Authors: We agree that the contraction guarantee is conditional on the transport-information inequality holding at the fixed points. The current manuscript invokes the inequality in the applications but does not supply the explicit verification at the relevant fixed points for either the GMM or the Pólya-Gamma logistic regression examples. In the revised version we will add explicit calculations (or references to known results under standard assumptions) confirming that the inequality holds at the CAVI fixed points for both models, thereby making the application of the theorem rigorous. revision: yes
Referee: [Abstract and §3] The functional smoothness condition required alongside the transport-information inequality is stated in the general theorem but its verification (or relaxation) for the non-smooth spaces mentioned in the abstract is not detailed, leaving the scope of the non-smooth extension unclear.

Authors: The abstract states that the results also hold in some non-smooth spaces. The functional smoothness condition is part of the theorem hypothesis, and the non-smooth claim refers to settings where the condition admits a suitable relaxation (e.g., via weak derivatives or discrete metrics). We acknowledge that §3 currently focuses on the smooth case and does not detail the non-smooth extension. In the revision we will expand §3 with a subsection providing concrete examples of non-smooth spaces and the corresponding relaxations of the smoothness assumption. revision: yes

Circularity Check

0 steps flagged

No circularity detected; contraction result is conditional on an external inequality assumption

full rationale

The paper's central result establishes local Wasserstein contraction of CAVI iterates conditional on a transport-information inequality holding at fixed points plus a smoothness condition. This is an implication, not a self-referential definition or fitted prediction. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are present in the abstract or described claims. The inequality is treated as an assumption to be verified separately for specific models, not derived within the paper, so the derivation chain does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions stated in the abstract; no free parameters or invented entities are mentioned.

axioms (2)

domain assumption Transport-information inequality at the fixed points
Explicitly required for the contraction result to hold.
domain assumption Functional smoothness condition
Stated as necessary for the general and sharp results.

pith-pipeline@v0.9.1-grok · 5623 in / 1108 out tokens · 25306 ms · 2026-06-29T05:25:53.551748+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references

[1]

(2009b))

Within the Bogachev–Kolesnikov hierarchy of geometric functional inequalities ‘of Gaus- sian type’ (Section 3.5 of Bogachev and Kolesnikov (2012)) TI inequalities lie in between Sobolev-type energy-entropy inequalities and transport(-entropy) inequalities, being quan- titatively weaker than the Logarithmic Sobolev inequality and quantitatively stronger th...

2012
[2]

the Poincaré inequality (Proposition 2.9.b of Guillin et al

Analogously to transportation-entropy inequalities, in the casep= 2, a TI inequality still implies an energy-entropy inequality ‘of exponential type’, i.e. the Poincaré inequality (Proposition 2.9.b of Guillin et al. (2009b)). Moreover, for log-concaveγ(orγwhich deviates from log-concavity sufficiently mildly), the TI inequality even yields a full Log- ar...
[3]

In terms of verifying TI inequalities ‘from scratch’, a rather crude but general tool is the method of Lyapunov conditions. Ifγis known a priori to satisfy a Poincaré inequality, then it is sufficient to exhibit a Lyapunov functionW≥1, a reference pointx0 ∈ X, and positive constantsb,cfor which LW≤b−cd(x, x 0)2 W. Under these conditions, a TI inequality h...

2017
[4]

For quantitative purposes, it often works well to study the metric contraction properties of the Langevin diffusion. In Wu (2009), it is established that granted e.g.1 hypocontractivity estimates of the form ∀x, x′ ∈ X,W 1 Pt (x,·), P t x′,· ≤ℓ(t)·d x, x′ withℓ : R+ →R + integrable, one can deduce a TI inequality forγwithp= 1(Corol- lary 2.2 of Wu (2009))...

2009
[5]

(2009b), and subsequently in Gao et al

Thus far, the main applications of TI inequalities have apparently been in establishing path-space concentration inequalities for additive functionals along trajectories of the Langevin diffusion, following the developments in, e.g., Theorem 4.1 of Guillin et al. (2009b), and subsequently in Gao et al. (2014). 1Actually, Wu (2009) makes the slightly more ...

2014

[1] [1]

(2009b))

Within the Bogachev–Kolesnikov hierarchy of geometric functional inequalities ‘of Gaus- sian type’ (Section 3.5 of Bogachev and Kolesnikov (2012)) TI inequalities lie in between Sobolev-type energy-entropy inequalities and transport(-entropy) inequalities, being quan- titatively weaker than the Logarithmic Sobolev inequality and quantitatively stronger th...

2012

[2] [2]

the Poincaré inequality (Proposition 2.9.b of Guillin et al

Analogously to transportation-entropy inequalities, in the casep= 2, a TI inequality still implies an energy-entropy inequality ‘of exponential type’, i.e. the Poincaré inequality (Proposition 2.9.b of Guillin et al. (2009b)). Moreover, for log-concaveγ(orγwhich deviates from log-concavity sufficiently mildly), the TI inequality even yields a full Log- ar...

[3] [3]

In terms of verifying TI inequalities ‘from scratch’, a rather crude but general tool is the method of Lyapunov conditions. Ifγis known a priori to satisfy a Poincaré inequality, then it is sufficient to exhibit a Lyapunov functionW≥1, a reference pointx0 ∈ X, and positive constantsb,cfor which LW≤b−cd(x, x 0)2 W. Under these conditions, a TI inequality h...

2017

[4] [4]

For quantitative purposes, it often works well to study the metric contraction properties of the Langevin diffusion. In Wu (2009), it is established that granted e.g.1 hypocontractivity estimates of the form ∀x, x′ ∈ X,W 1 Pt (x,·), P t x′,· ≤ℓ(t)·d x, x′ withℓ : R+ →R + integrable, one can deduce a TI inequality forγwithp= 1(Corol- lary 2.2 of Wu (2009))...

2009

[5] [5]

(2009b), and subsequently in Gao et al

Thus far, the main applications of TI inequalities have apparently been in establishing path-space concentration inequalities for additive functionals along trajectories of the Langevin diffusion, following the developments in, e.g., Theorem 4.1 of Guillin et al. (2009b), and subsequently in Gao et al. (2014). 1Actually, Wu (2009) makes the slightly more ...

2014