Recognition: unknown
Discrete Optimal Transport: Rapid Convergence of Simulated Annealing Algorithms
Pith reviewed 2026-05-08 06:08 UTC · model grok-4.3
The pith
Annealed Glauber dynamics achieves ε KL error in O(n^5 β²/ε) steps for mean-field Ising at any temperature.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop a discrete optimal transport framework for analyzing simulated annealing algorithms on finite state spaces. Building on the discrete Wasserstein metric, we define a generalized discrete Wasserstein-2 distance and the associated notion of discrete action for paths of probability measures on graphs. Using these tools, we establish non-asymptotic convergence guarantees for simulated annealing: the KL divergence between the algorithm's output and the target distribution is controlled by the discrete action of the annealing path. For the mean-field Ising model, annealed single-site Glauber dynamics achieves ε error in KL divergence in O(n^5 β² / ε) steps at any β ≥ 0. For the meanfield
What carries the argument
Generalized discrete Wasserstein-2 distance and discrete action for paths of probability measures on graphs, which controls KL divergence along the annealing path.
If this is right
- KL divergence to the target is bounded above by the discrete action of the chosen annealing path.
- Ising model requires only O(n^5 β² / ε) steps for ε accuracy at every temperature.
- Potts model admits polynomial steps whenever β exceeds the stability threshold q/2.
- Symmetry reduction to a projected chain produces the polynomial action bound in both models.
Where Pith is reading between the lines
- The same symmetry-based projection technique may produce polynomial bounds for other mean-field or highly symmetric discrete models.
- Explicit minimization of the discrete action could be used to design faster annealing schedules than the linear or logarithmic ones analyzed here.
- Numerical evaluation of the action for moderate system sizes would test how sharp the polynomial degree is in practice.
Load-bearing premise
The discrete action of the annealing path admits a polynomial upper bound obtained by exploiting model symmetry to reduce the analysis to a low-dimensional projected chain.
What would settle it
Direct computation of the discrete action for the mean-field Ising model at n=20 showing super-polynomial growth in n for standard schedules would disprove the claimed rates.
Figures
read the original abstract
We develop a discrete optimal transport framework for analyzing simulated annealing algorithms on finite state spaces. Building on the discrete Wasserstein metric introduced by Maas (J. Funct. Anal., 2011), we define a generalized discrete Wasserstein-2 distance and the associated notion of \emph{discrete action} for paths of probability measures on graphs. Using these tools, we establish non-asymptotic convergence guarantees for simulated annealing: the KL divergence between the algorithm's output and the target distribution is controlled by the discrete action of the annealing path. This can be viewed as the discrete counterpart of the action-based analysis of annealed Langevin dynamics in continuous spaces by Guo, Tao, and Chen (ICLR 2025). As applications, we analyze simulated annealing for two fundamental models in statistical physics. For the \emph{mean-field Ising model}, we show that annealed single-site Glauber dynamics achieves $\varepsilon$ error in KL divergence in $O(n^5\beta^2/\varepsilon)$ steps at \emph{any} inverse temperature $\beta \ge 0$. For the \emph{mean-field $q$-state Potts model}, we show that annealed $(q-1)$-block Glauber dynamics achieves $\varepsilon$ error in $\mathrm{poly}(n, \beta, 1/\varepsilon)$ steps for all $\beta \ge \beta_{\mathsf{s}}=q/2$, the regime where the disordered phase has completely lost stability. In both cases, the key technical contribution is a polynomial upper bound on the discrete action, obtained by exploiting the symmetry of the model to reduce the analysis to a low-dimensional projected chain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a discrete optimal transport framework based on a generalized discrete Wasserstein-2 distance and the associated discrete action for paths of probability measures on graphs. It applies this to establish non-asymptotic convergence rates for simulated annealing algorithms using Glauber dynamics on the mean-field Ising and Potts models, claiming polynomial-time bounds in n, β, and 1/ε to achieve ε KL divergence error, with the bounds derived from polynomial upper bounds on the discrete action via symmetry reductions to low-dimensional chains.
Significance. Should the results hold, this provides a novel discrete analogue to continuous optimal transport analyses of annealing, offering explicit polynomial convergence guarantees for important statistical mechanics models at all temperatures (Ising) and above the stability threshold (Potts). This could advance the theoretical understanding of mixing times for discrete MCMC methods and has potential for broader applications in sampling problems. The framework's strength lies in its non-asymptotic nature and use of model symmetry for bounds.
major comments (3)
- [Section deriving the discrete action bound for the Ising model] The O(n^5 β² / ε) bound for annealed single-site Glauber dynamics relies on a polynomial upper bound for the discrete action obtained by reducing the analysis to a low-dimensional projected chain using model symmetry. The manuscript lacks an explicit construction of this projection map and a rigorous proof that the projected action controls the full discrete Wasserstein-2 action up to polynomial factors only. This is particularly concerning for large β, where the system is far from equilibrium and correlations are strong; uncontrolled errors in the projection could lead to exponential dependencies, undermining the claimed bound.
- [General framework section on KL control by discrete action] It is stated that the KL divergence between the output and target is controlled by the discrete action of the annealing path. However, the specific inequality or theorem establishing this control (including dependence on β and integration over the path) is not detailed sufficiently to verify how it leads to the ε error in the stated number of steps. Clarification on the constants and any hidden factors is needed.
- [Potts model analysis] The poly(n, β, 1/ε) bound for the (q-1)-block Glauber dynamics above β_s = q/2 similarly depends on the projection technique. The analysis should include verification that the reduction holds in the regime where the disordered phase loses stability, ensuring no additional factors arise near the phase transition.
minor comments (1)
- [Abstract] The abstract could more explicitly mention the models and the specific dynamics used to make the claims clearer at first reading.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable comments on our manuscript. We agree that additional details on the projection technique and the KL control inequality are needed to strengthen the presentation and rigor. We will revise the manuscript accordingly to address all major points.
read point-by-point responses
-
Referee: [Section deriving the discrete action bound for the Ising model] The O(n^5 β² / ε) bound for annealed single-site Glauber dynamics relies on a polynomial upper bound for the discrete action obtained by reducing the analysis to a low-dimensional projected chain using model symmetry. The manuscript lacks an explicit construction of this projection map and a rigorous proof that the projected action controls the full discrete Wasserstein-2 action up to polynomial factors only. This is particularly concerning for large β, where the system is far from equilibrium and correlations are strong; uncontrolled errors in the projection could lead to exponential dependencies, undermining the claimed bound.
Authors: We thank the referee for highlighting this point. The projection exploits the exchangeability of the mean-field Ising model to map the empirical magnetization to a one-dimensional chain, and the discrete action bound follows from a direct comparison of the Dirichlet forms. In the revision, we will add an explicit definition of the projection map π: Δ([n]) → [0,1] together with a complete proof that the action of the projected path upper-bounds the original discrete Wasserstein-2 action up to a factor polynomial in n and β. The error control uses the mean-field interaction structure and holds uniformly for all β ≥ 0, preventing exponential blow-up at low temperature. revision: yes
-
Referee: [General framework section on KL control by discrete action] It is stated that the KL divergence between the output and target is controlled by the discrete action of the annealing path. However, the specific inequality or theorem establishing this control (including dependence on β and integration over the path) is not detailed sufficiently to verify how it leads to the ε error in the stated number of steps. Clarification on the constants and any hidden factors is needed.
Authors: We agree that the link between discrete action and KL divergence requires more explicit treatment. The control follows from a discrete analogue of the Benamou–Brenier formula combined with a Gronwall-type estimate along the annealing path; the resulting bound is KL(μ_T || π_β) ≤ C(β) ∫_0^T A(μ_t) dt where C(β) is at most linear in β. In the revision we will state this inequality as a numbered theorem in the general framework section, display the precise dependence on β, and show how the polynomial upper bound on the action directly yields the claimed O(n^5 β² / ε) step count. revision: yes
-
Referee: [Potts model analysis] The poly(n, β, 1/ε) bound for the (q-1)-block Glauber dynamics above β_s = q/2 similarly depends on the projection technique. The analysis should include verification that the reduction holds in the regime where the disordered phase loses stability, ensuring no additional factors arise near the phase transition.
Authors: We will strengthen the Potts-model section by adding a uniform-in-β verification of the projection reduction for all β ≥ β_s. The argument relies on the fact that the (q-1)-block dynamics preserves the symmetry of the disordered phase and that the discrete action remains controlled by a low-dimensional ODE whose coefficients stay polynomial even at the stability threshold β_s. No exponential factors appear because the mean-field interaction remains bounded and the spectral gap of the projected chain is bounded away from zero uniformly above β_s. These details will be inserted as a dedicated lemma. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper defines a generalized discrete Wasserstein-2 distance and discrete action building directly on the external reference Maas (J. Funct. Anal. 2011). It then relates KL error to the discrete action of the annealing path, presented as the discrete analog of the external Guo-Tao-Chen (ICLR 2025) continuous analysis. For the Ising and Potts models the central step is deriving a polynomial upper bound on that action by symmetry reduction to a low-dimensional projected chain. No equation in the supplied text equates a claimed prediction or bound to a fitted parameter or prior self-result by construction; the reduction is offered as an independent derivation rather than a renaming or self-citation that carries the load. External citations are to non-overlapping authors and are not invoked as uniqueness theorems that forbid alternatives. The overall guarantees therefore do not collapse to tautological inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The discrete Wasserstein metric introduced by Maas (2011) can be generalized to a Wasserstein-2 distance on graphs.
- domain assumption The KL divergence is controlled by the discrete action of the annealing path.
invented entities (1)
-
discrete action
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Provable Benefit of Annealed
Wei Guo and Molei Tao and Yongxin Chen , booktitle=. Provable Benefit of Annealed
-
[2]
A computational fluid mechanics solution to the
Benamou, Jean-David and Brenier, Yann , journal=. A computational fluid mechanics solution to the. 2000 , publisher=
2000
-
[3]
2005 , publisher=
Gradient flows: in metric spaces and in the space of probability measures , author=. 2005 , publisher=
2005
-
[4]
Optimal transport for applied mathematicians: Calculus of variations,
Filippo, Santambrogio , year=. Optimal transport for applied mathematicians: Calculus of variations,
-
[5]
Gradient flows of the entropy for finite
Maas, Jan , journal=. Gradient flows of the entropy for finite. 2011 , publisher=
2011
-
[6]
2013 , publisher=
Scaling limits of interacting particle systems , author=. 2013 , publisher=
2013
-
[7]
2013 , publisher=
Limit theorems for stochastic processes , author=. 2013 , publisher=
2013
-
[8]
Multivariate point processes: predictable projection,
Jacod, Jean , journal=. Multivariate point processes: predictable projection,. 1975 , publisher=
1975
-
[9]
Girsanov theory under a finite entropy condition , author=. S. 2012 , publisher=
2012
-
[10]
Optimal control formulation of transition path problems for
Gao, Yuan and Liu, Jian-Guo and Tse, Oliver , journal=. Optimal control formulation of transition path problems for
-
[11]
Point processes and queues, martingale dynamics , year =
Br. Point processes and queues, martingale dynamics , year =. Point processes and queues, martingale dynamics , isbn =
-
[12]
2017 , publisher=
Markov chains and mixing times , author=. 2017 , publisher=
2017
-
[13]
On approximating the
He, Roxanne and Lok, Jackie , journal=. On approximating the. 2025 , publisher=
2025
-
[14]
Mean-field
Blanca, Antonio and Gheissari, Reza and Zhang, Xusheng , booktitle=. Mean-field. 2025 , organization=
2025
-
[15]
Glauber dynamics for the mean-field
Levin, David A and Luczak, Malwina J and Peres, Yuval , journal=. Glauber dynamics for the mean-field. 2010 , publisher=
2010
-
[16]
A power law of order 1/4 for critical mean field
Long, Yun and Nachmias, Asaf and Ning, Weiyang and Peres, Yuval , volume=. A power law of order 1/4 for critical mean field. 2014 , publisher=
2014
-
[17]
Random Structures & Algorithms , volume=
On the swapping algorithm , author=. Random Structures & Algorithms , volume=. 2003 , publisher=
2003
-
[18]
The Annals of Applied Probability , volume=
Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions , author=. The Annals of Applied Probability , volume=
-
[19]
Glauber dynamics for the mean-field
Cuff, Paul and Ding, Jian and Louidor, Oren and Lubetzky, Eyal and Peres, Yuval and Sly, Allan , journal=. Glauber dynamics for the mean-field
-
[20]
Nonlinearity , volume=
A gradient structure for reaction--diffusion systems and for energy-drift-diffusion systems , author=. Nonlinearity , volume=. 2011 , publisher=
2011
-
[21]
Ricci curvature of finite
Erbar, Matthias and Maas, Jan , journal=. Ricci curvature of finite. 2012 , publisher=
2012
-
[22]
Swendsen-
Galanis, Andreas and. Swendsen-. Random Structures & Algorithms , volume =
-
[23]
Journal of Statistical Physics , volume=
Simulated tempering and swapping on mean-field models , author=. Journal of Statistical Physics , volume=. 2016 , publisher=
2016
-
[24]
Fokker--
Chow, Shui-Nee and Huang, Wuchen and Li, Yao and Zhou, Haomin , journal=. Fokker--. 2012 , publisher=
2012
-
[25]
2009 , publisher=
Optimal transport: old and new , author=. 2009 , publisher=
2009
-
[26]
The variational formulation of the
Jordan, Richard and Kinderlehrer, David and Otto, Felix , journal=. The variational formulation of the. 1998 , publisher=
1998
-
[27]
SIAM Journal on Computing , volume =
Jerrum, Mark and Sinclair, Alistair , title =. SIAM Journal on Computing , volume =. 1993 , publisher =
1993
-
[28]
Journal of the ACM , volume =
Goldberg, Leslie Ann and Jerrum, Mark , title =. Journal of the ACM , volume =. 2012 , publisher =
2012
-
[29]
2026 , note =
Sinho Chewi , title =. 2026 , note =
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.