pith. machine review for the scientific record. sign in

arxiv: 2605.05877 · v1 · submitted 2026-05-07 · 💻 cs.DS · math.PR

Recognition: unknown

Discrete Optimal Transport: Rapid Convergence of Simulated Annealing Algorithms

Chihao Zhang, Sihan Wang, Tianhui Jiang, Yuchen He

Pith reviewed 2026-05-08 06:08 UTC · model grok-4.3

classification 💻 cs.DS math.PR
keywords discrete optimal transportsimulated annealingGlauber dynamicsmean-field Ising modelmean-field Potts modeldiscrete Wasserstein distanceKL divergenceconvergence rates
0
0 comments X

The pith

Annealed Glauber dynamics achieves ε KL error in O(n^5 β²/ε) steps for mean-field Ising at any temperature.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a discrete optimal transport framework to analyze simulated annealing algorithms on finite state spaces. It defines a generalized discrete Wasserstein-2 distance and the associated discrete action for paths of probability measures on graphs. This framework bounds the KL divergence to the target distribution by the discrete action along the annealing path. For the mean-field Ising model, annealed single-site Glauber dynamics reaches ε error in O(n^5 β² / ε) steps at any inverse temperature β. For the mean-field q-state Potts model above β = q/2, annealed block Glauber dynamics yields polynomial bounds in n, β, and 1/ε.

Core claim

We develop a discrete optimal transport framework for analyzing simulated annealing algorithms on finite state spaces. Building on the discrete Wasserstein metric, we define a generalized discrete Wasserstein-2 distance and the associated notion of discrete action for paths of probability measures on graphs. Using these tools, we establish non-asymptotic convergence guarantees for simulated annealing: the KL divergence between the algorithm's output and the target distribution is controlled by the discrete action of the annealing path. For the mean-field Ising model, annealed single-site Glauber dynamics achieves ε error in KL divergence in O(n^5 β² / ε) steps at any β ≥ 0. For the meanfield

What carries the argument

Generalized discrete Wasserstein-2 distance and discrete action for paths of probability measures on graphs, which controls KL divergence along the annealing path.

If this is right

  • KL divergence to the target is bounded above by the discrete action of the chosen annealing path.
  • Ising model requires only O(n^5 β² / ε) steps for ε accuracy at every temperature.
  • Potts model admits polynomial steps whenever β exceeds the stability threshold q/2.
  • Symmetry reduction to a projected chain produces the polynomial action bound in both models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same symmetry-based projection technique may produce polynomial bounds for other mean-field or highly symmetric discrete models.
  • Explicit minimization of the discrete action could be used to design faster annealing schedules than the linear or logarithmic ones analyzed here.
  • Numerical evaluation of the action for moderate system sizes would test how sharp the polynomial degree is in practice.

Load-bearing premise

The discrete action of the annealing path admits a polynomial upper bound obtained by exploiting model symmetry to reduce the analysis to a low-dimensional projected chain.

What would settle it

Direct computation of the discrete action for the mean-field Ising model at n=20 showing super-polynomial growth in n for standard schedules would disprove the claimed rates.

Figures

Figures reproduced from arXiv: 2605.05877 by Chihao Zhang, Sihan Wang, Tianhui Jiang, Yuchen He.

Figure 1
Figure 1. Figure 1: Phase diagram of the mean-field Potts model ( view at source ↗
read the original abstract

We develop a discrete optimal transport framework for analyzing simulated annealing algorithms on finite state spaces. Building on the discrete Wasserstein metric introduced by Maas (J. Funct. Anal., 2011), we define a generalized discrete Wasserstein-2 distance and the associated notion of \emph{discrete action} for paths of probability measures on graphs. Using these tools, we establish non-asymptotic convergence guarantees for simulated annealing: the KL divergence between the algorithm's output and the target distribution is controlled by the discrete action of the annealing path. This can be viewed as the discrete counterpart of the action-based analysis of annealed Langevin dynamics in continuous spaces by Guo, Tao, and Chen (ICLR 2025). As applications, we analyze simulated annealing for two fundamental models in statistical physics. For the \emph{mean-field Ising model}, we show that annealed single-site Glauber dynamics achieves $\varepsilon$ error in KL divergence in $O(n^5\beta^2/\varepsilon)$ steps at \emph{any} inverse temperature $\beta \ge 0$. For the \emph{mean-field $q$-state Potts model}, we show that annealed $(q-1)$-block Glauber dynamics achieves $\varepsilon$ error in $\mathrm{poly}(n, \beta, 1/\varepsilon)$ steps for all $\beta \ge \beta_{\mathsf{s}}=q/2$, the regime where the disordered phase has completely lost stability. In both cases, the key technical contribution is a polynomial upper bound on the discrete action, obtained by exploiting the symmetry of the model to reduce the analysis to a low-dimensional projected chain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces a discrete optimal transport framework based on a generalized discrete Wasserstein-2 distance and the associated discrete action for paths of probability measures on graphs. It applies this to establish non-asymptotic convergence rates for simulated annealing algorithms using Glauber dynamics on the mean-field Ising and Potts models, claiming polynomial-time bounds in n, β, and 1/ε to achieve ε KL divergence error, with the bounds derived from polynomial upper bounds on the discrete action via symmetry reductions to low-dimensional chains.

Significance. Should the results hold, this provides a novel discrete analogue to continuous optimal transport analyses of annealing, offering explicit polynomial convergence guarantees for important statistical mechanics models at all temperatures (Ising) and above the stability threshold (Potts). This could advance the theoretical understanding of mixing times for discrete MCMC methods and has potential for broader applications in sampling problems. The framework's strength lies in its non-asymptotic nature and use of model symmetry for bounds.

major comments (3)
  1. [Section deriving the discrete action bound for the Ising model] The O(n^5 β² / ε) bound for annealed single-site Glauber dynamics relies on a polynomial upper bound for the discrete action obtained by reducing the analysis to a low-dimensional projected chain using model symmetry. The manuscript lacks an explicit construction of this projection map and a rigorous proof that the projected action controls the full discrete Wasserstein-2 action up to polynomial factors only. This is particularly concerning for large β, where the system is far from equilibrium and correlations are strong; uncontrolled errors in the projection could lead to exponential dependencies, undermining the claimed bound.
  2. [General framework section on KL control by discrete action] It is stated that the KL divergence between the output and target is controlled by the discrete action of the annealing path. However, the specific inequality or theorem establishing this control (including dependence on β and integration over the path) is not detailed sufficiently to verify how it leads to the ε error in the stated number of steps. Clarification on the constants and any hidden factors is needed.
  3. [Potts model analysis] The poly(n, β, 1/ε) bound for the (q-1)-block Glauber dynamics above β_s = q/2 similarly depends on the projection technique. The analysis should include verification that the reduction holds in the regime where the disordered phase loses stability, ensuring no additional factors arise near the phase transition.
minor comments (1)
  1. [Abstract] The abstract could more explicitly mention the models and the specific dynamics used to make the claims clearer at first reading.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable comments on our manuscript. We agree that additional details on the projection technique and the KL control inequality are needed to strengthen the presentation and rigor. We will revise the manuscript accordingly to address all major points.

read point-by-point responses
  1. Referee: [Section deriving the discrete action bound for the Ising model] The O(n^5 β² / ε) bound for annealed single-site Glauber dynamics relies on a polynomial upper bound for the discrete action obtained by reducing the analysis to a low-dimensional projected chain using model symmetry. The manuscript lacks an explicit construction of this projection map and a rigorous proof that the projected action controls the full discrete Wasserstein-2 action up to polynomial factors only. This is particularly concerning for large β, where the system is far from equilibrium and correlations are strong; uncontrolled errors in the projection could lead to exponential dependencies, undermining the claimed bound.

    Authors: We thank the referee for highlighting this point. The projection exploits the exchangeability of the mean-field Ising model to map the empirical magnetization to a one-dimensional chain, and the discrete action bound follows from a direct comparison of the Dirichlet forms. In the revision, we will add an explicit definition of the projection map π: Δ([n]) → [0,1] together with a complete proof that the action of the projected path upper-bounds the original discrete Wasserstein-2 action up to a factor polynomial in n and β. The error control uses the mean-field interaction structure and holds uniformly for all β ≥ 0, preventing exponential blow-up at low temperature. revision: yes

  2. Referee: [General framework section on KL control by discrete action] It is stated that the KL divergence between the output and target is controlled by the discrete action of the annealing path. However, the specific inequality or theorem establishing this control (including dependence on β and integration over the path) is not detailed sufficiently to verify how it leads to the ε error in the stated number of steps. Clarification on the constants and any hidden factors is needed.

    Authors: We agree that the link between discrete action and KL divergence requires more explicit treatment. The control follows from a discrete analogue of the Benamou–Brenier formula combined with a Gronwall-type estimate along the annealing path; the resulting bound is KL(μ_T || π_β) ≤ C(β) ∫_0^T A(μ_t) dt where C(β) is at most linear in β. In the revision we will state this inequality as a numbered theorem in the general framework section, display the precise dependence on β, and show how the polynomial upper bound on the action directly yields the claimed O(n^5 β² / ε) step count. revision: yes

  3. Referee: [Potts model analysis] The poly(n, β, 1/ε) bound for the (q-1)-block Glauber dynamics above β_s = q/2 similarly depends on the projection technique. The analysis should include verification that the reduction holds in the regime where the disordered phase loses stability, ensuring no additional factors arise near the phase transition.

    Authors: We will strengthen the Potts-model section by adding a uniform-in-β verification of the projection reduction for all β ≥ β_s. The argument relies on the fact that the (q-1)-block dynamics preserves the symmetry of the disordered phase and that the discrete action remains controlled by a low-dimensional ODE whose coefficients stay polynomial even at the stability threshold β_s. No exponential factors appear because the mean-field interaction remains bounded and the spectral gap of the projected chain is bounded away from zero uniformly above β_s. These details will be inserted as a dedicated lemma. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines a generalized discrete Wasserstein-2 distance and discrete action building directly on the external reference Maas (J. Funct. Anal. 2011). It then relates KL error to the discrete action of the annealing path, presented as the discrete analog of the external Guo-Tao-Chen (ICLR 2025) continuous analysis. For the Ising and Potts models the central step is deriving a polynomial upper bound on that action by symmetry reduction to a low-dimensional projected chain. No equation in the supplied text equates a claimed prediction or bound to a fitted parameter or prior self-result by construction; the reduction is offered as an independent derivation rather than a renaming or self-citation that carries the load. External citations are to non-overlapping authors and are not invoked as uniqueness theorems that forbid alternatives. The overall guarantees therefore do not collapse to tautological inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper relies on the prior discrete Wasserstein metric from Maas 2011 as a foundation and introduces the discrete action as a new tool for bounding paths. No free parameters are mentioned. The symmetry reduction for mean-field models is a domain assumption. The central claim rests on the control of KL by discrete action.

axioms (2)
  • domain assumption The discrete Wasserstein metric introduced by Maas (2011) can be generalized to a Wasserstein-2 distance on graphs.
    Used as foundation for the framework.
  • domain assumption The KL divergence is controlled by the discrete action of the annealing path.
    Central to the convergence guarantee.
invented entities (1)
  • discrete action no independent evidence
    purpose: To quantify the cost of paths in the space of probability measures for bounding convergence.
    New notion defined in the paper to enable the analysis.

pith-pipeline@v0.9.0 · 5600 in / 1596 out tokens · 79424 ms · 2026-05-08T06:08:39.108298+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references

  1. [1]

    Provable Benefit of Annealed

    Wei Guo and Molei Tao and Yongxin Chen , booktitle=. Provable Benefit of Annealed

  2. [2]

    A computational fluid mechanics solution to the

    Benamou, Jean-David and Brenier, Yann , journal=. A computational fluid mechanics solution to the. 2000 , publisher=

  3. [3]

    2005 , publisher=

    Gradient flows: in metric spaces and in the space of probability measures , author=. 2005 , publisher=

  4. [4]

    Optimal transport for applied mathematicians: Calculus of variations,

    Filippo, Santambrogio , year=. Optimal transport for applied mathematicians: Calculus of variations,

  5. [5]

    Gradient flows of the entropy for finite

    Maas, Jan , journal=. Gradient flows of the entropy for finite. 2011 , publisher=

  6. [6]

    2013 , publisher=

    Scaling limits of interacting particle systems , author=. 2013 , publisher=

  7. [7]

    2013 , publisher=

    Limit theorems for stochastic processes , author=. 2013 , publisher=

  8. [8]

    Multivariate point processes: predictable projection,

    Jacod, Jean , journal=. Multivariate point processes: predictable projection,. 1975 , publisher=

  9. [9]

    Girsanov theory under a finite entropy condition , author=. S. 2012 , publisher=

  10. [10]

    Optimal control formulation of transition path problems for

    Gao, Yuan and Liu, Jian-Guo and Tse, Oliver , journal=. Optimal control formulation of transition path problems for

  11. [11]

    Point processes and queues, martingale dynamics , year =

    Br. Point processes and queues, martingale dynamics , year =. Point processes and queues, martingale dynamics , isbn =

  12. [12]

    2017 , publisher=

    Markov chains and mixing times , author=. 2017 , publisher=

  13. [13]

    On approximating the

    He, Roxanne and Lok, Jackie , journal=. On approximating the. 2025 , publisher=

  14. [14]

    Mean-field

    Blanca, Antonio and Gheissari, Reza and Zhang, Xusheng , booktitle=. Mean-field. 2025 , organization=

  15. [15]

    Glauber dynamics for the mean-field

    Levin, David A and Luczak, Malwina J and Peres, Yuval , journal=. Glauber dynamics for the mean-field. 2010 , publisher=

  16. [16]

    A power law of order 1/4 for critical mean field

    Long, Yun and Nachmias, Asaf and Ning, Weiyang and Peres, Yuval , volume=. A power law of order 1/4 for critical mean field. 2014 , publisher=

  17. [17]

    Random Structures & Algorithms , volume=

    On the swapping algorithm , author=. Random Structures & Algorithms , volume=. 2003 , publisher=

  18. [18]

    The Annals of Applied Probability , volume=

    Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions , author=. The Annals of Applied Probability , volume=

  19. [19]

    Glauber dynamics for the mean-field

    Cuff, Paul and Ding, Jian and Louidor, Oren and Lubetzky, Eyal and Peres, Yuval and Sly, Allan , journal=. Glauber dynamics for the mean-field

  20. [20]

    Nonlinearity , volume=

    A gradient structure for reaction--diffusion systems and for energy-drift-diffusion systems , author=. Nonlinearity , volume=. 2011 , publisher=

  21. [21]

    Ricci curvature of finite

    Erbar, Matthias and Maas, Jan , journal=. Ricci curvature of finite. 2012 , publisher=

  22. [22]

    Swendsen-

    Galanis, Andreas and. Swendsen-. Random Structures & Algorithms , volume =

  23. [23]

    Journal of Statistical Physics , volume=

    Simulated tempering and swapping on mean-field models , author=. Journal of Statistical Physics , volume=. 2016 , publisher=

  24. [24]

    Fokker--

    Chow, Shui-Nee and Huang, Wuchen and Li, Yao and Zhou, Haomin , journal=. Fokker--. 2012 , publisher=

  25. [25]

    2009 , publisher=

    Optimal transport: old and new , author=. 2009 , publisher=

  26. [26]

    The variational formulation of the

    Jordan, Richard and Kinderlehrer, David and Otto, Felix , journal=. The variational formulation of the. 1998 , publisher=

  27. [27]

    SIAM Journal on Computing , volume =

    Jerrum, Mark and Sinclair, Alistair , title =. SIAM Journal on Computing , volume =. 1993 , publisher =

  28. [28]

    Journal of the ACM , volume =

    Goldberg, Leslie Ann and Jerrum, Mark , title =. Journal of the ACM , volume =. 2012 , publisher =

  29. [29]

    2026 , note =

    Sinho Chewi , title =. 2026 , note =