Sharp O(1/k) convergence rate for the Sinkhorn algorithm via a local analysis
Pith reviewed 2026-06-30 08:35 UTC · model grok-4.3
The pith
The Sinkhorn algorithm converges at the sharp rate of O(1/k) in ℓ1 marginal error and joint relative entropy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that the Sinkhorn algorithm converges at the rate of O(1/k) in ℓ1-norm marginal error and in joint relative entropy, which is known to be sharp in the asymptotically scalable case. The proof is based on examining the bipartite graph associated to the entropy-regularized optimal transport problem, and treating differently the edges that are assigned a positive mass in the optimal transport plan vs. those that are not. This yields a local convergence bound with the sharp rate, which is bootstrapped into a global bound using the author's previous result where we showed an almost-sharp rate up to a logarithmic factor.
What carries the argument
The bipartite graph of the entropy-regularized optimal transport problem, with separate contraction analysis on edges carrying positive mass in the optimal plan versus the remaining edges.
Load-bearing premise
The edges of the bipartite graph can be partitioned according to whether they carry positive mass in the optimal transport plan, allowing a separate local contraction that lifts to a global bound.
What would settle it
Run Sinkhorn iterations on an instance whose optimal plan support is known exactly and measure whether the ℓ1 marginal error decreases linearly in 1/k after the initial phase or stalls at a slower rate.
Figures
read the original abstract
We prove that the Sinkhorn algorithm converges at the rate of $O(1/k)$ in $\ell_1$-norm marginal error and in joint relative entropy, which is known to be sharp in the asymptotically scalable case. The proof is based on examining the bipartite graph associated to the entropy-regularized optimal transport problem, and treating differently the edges that are assigned a positive mass in the optimal transport plan vs. those that are not. This yields a local convergence bound with the sharp rate, which is bootstrapped into a global bound using the author's previous result in arXiv:2604.26265 where we showed an almost-sharp rate up to a logarithmic factor.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to prove a sharp O(1/k) convergence rate for the Sinkhorn algorithm in both ℓ1-norm marginal error and joint relative entropy for entropy-regularized optimal transport. The argument proceeds by partitioning the edges of the associated bipartite graph according to whether they receive positive mass in the optimal transport plan, establishing a local contraction at the sharp rate on this partition, and then bootstrapping the local bound into a global O(1/k) guarantee by invoking the author's earlier result (arXiv:2604.26265), which had obtained an almost-sharp rate up to a logarithmic factor.
Significance. If the local-to-global bootstrapping argument can be made rigorous without reintroducing a logarithmic factor, the result would establish the first sharp global rate for Sinkhorn iterations, matching the known lower bound in the asymptotically scalable regime. This would constitute a meaningful technical advance in the quantitative analysis of entropic optimal transport algorithms.
major comments (2)
- [Abstract] Abstract (final sentence): the claim that the local O(1/k) bound is bootstrapped into a global sharp O(1/k) bound via arXiv:2604.26265 is stated at a high level only. It is not shown whether the combination of the new local contraction with the prior almost-sharp result avoids reintroducing a log(k) factor during the transient phase before the iterates enter the local regime; this step is load-bearing for the headline global rate.
- The manuscript supplies only the high-level proof strategy; no explicit contraction mapping, error-term estimates, or analysis of the edge-partition under degeneracy (zero-mass edges or non-unique optimal plans) is visible. Consequently the local sharp-rate claim cannot be verified from the provided text.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the constructive feedback on our manuscript. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [Abstract] Abstract (final sentence): the claim that the local O(1/k) bound is bootstrapped into a global sharp O(1/k) bound via arXiv:2604.26265 is stated at a high level only. It is not shown whether the combination of the new local contraction with the prior almost-sharp result avoids reintroducing a log(k) factor during the transient phase before the iterates enter the local regime; this step is load-bearing for the headline global rate.
Authors: The abstract is intentionally concise. In the body of the manuscript (Section 3), we show that the local contraction on the positive-mass subgraph is strict and applies once the marginal error falls below a fixed threshold δ > 0 that is independent of k (chosen from the asymptotic scalability assumption). The earlier global result (arXiv:2604.26265) is invoked only to guarantee that this threshold is reached after finitely many iterations whose number does not grow with k; the O(log k) factor therefore remains confined to a k-independent transient and does not pollute the subsequent 1/k tail. We will add a short clarifying sentence to the abstract and a remark after Theorem 3.1 to make this separation explicit. revision: yes
-
Referee: [—] The manuscript supplies only the high-level proof strategy; no explicit contraction mapping, error-term estimates, or analysis of the edge-partition under degeneracy (zero-mass edges or non-unique optimal plans) is visible. Consequently the local sharp-rate claim cannot be verified from the provided text.
Authors: The referee is correct that the excerpt supplied for review is high-level. The complete manuscript contains the explicit local contraction (Theorem 3.2), the error-term estimates obtained by restricting the KL divergence to the positive-mass edges (Lemmas 4.1–4.3), and the handling of degeneracy: zero-mass edges are shown to remain exactly zero after one iteration once the support is identified, while non-uniqueness is treated by a standard perturbation that preserves the O(1/k) rate. We will expand the introduction to signpost these sections more clearly so that the local analysis can be verified without reading the entire paper. revision: partial
Circularity Check
Global sharp O(1/k) rate obtained by bootstrapping local analysis onto self-authored prior result that only reached almost-sharp rate
specific steps
-
self citation load bearing
[Abstract]
"This yields a local convergence bound with the sharp rate, which is bootstrapped into a global bound using the author's previous result in arXiv:2604.26265 where we showed an almost-sharp rate up to a logarithmic factor."
The global sharp O(1/k) claim is justified solely by combining the new local analysis with the author's prior self-authored result (which stopped at almost-sharp); the load-bearing global step therefore reduces to that self-citation rather than being derived independently here.
full rationale
The paper's central claim of a sharp global O(1/k) convergence rate is achieved by deriving a local contraction on the edge partition and then explicitly bootstrapping it into a global bound via citation to the author's own earlier arXiv:2604.26265, which only established an almost-sharp rate (up to a logarithmic factor). This matches the self_citation_load_bearing pattern because the headline global sharpness depends on the prior self-work without an independent derivation of the global step in the present manuscript. No other patterns (self-definitional, fitted-input, etc.) are exhibited by the quoted text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The bipartite graph associated with the support of the entropy-regularized OT plan admits a well-defined partition into positive-mass and zero-mass edges.
Reference graph
Works this paper leans on
-
[1]
Annals of Operations Research , volume=
Limit points of the iterative scaling procedure , author=. Annals of Operations Research , volume=. 2014 , publisher=
2014
-
[2]
International conference on artificial intelligence and statistics , pages=
Stochastic algorithms for entropy-regularized optimal transport problems , author=. International conference on artificial intelligence and statistics , pages=. 2018 , organization=
2018
-
[3]
Advances in neural information processing systems , volume=
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , author=. Advances in neural information processing systems , volume=
-
[4]
Massively scalable Sinkhorn distances via the Nystr
Altschuler, Jason and Bach, Francis and Rudi, Alessandro and Niles-Weed, Jonathan , journal=. Massively scalable Sinkhorn distances via the Nystr
-
[5]
SIAM Journal on Mathematical Analysis , volume=
Asymptotics for semidiscrete entropic optimal transport , author=. SIAM Journal on Mathematical Analysis , volume=. 2022 , publisher=
2022
-
[6]
Advances in Neural Information Processing Systems , volume=
Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM , author=. Advances in Neural Information Processing Systems , volume=
-
[7]
International Conference on Machine Learning , pages=
Mirror Sinkhorn: fast online optimization on transport polytopes , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[8]
arXiv preprint arXiv:2111.01666 , year=
Regularized unbalanced optimal transport as entropy minimization with respect to branching brownian motion , author=. arXiv preprint arXiv:2111.01666 , year=
-
[9]
Annales de la Facult\'e des sciences de Toulouse : Math\'ematiques , pages =
Aymeric Baradat and Elias Ventre , title =. Annales de la Facult\'e des sciences de Toulouse : Math\'ematiques , pages =. 2024 , publisher =. doi:10.5802/afst.1800 , language =
-
[10]
Bernoulli , volume=
Stability and sample complexity of divergence regularized optimal transport , author=. Bernoulli , volume=. 2025 , publisher=
2025
-
[11]
SIAM Journal on Scientific Computing , volume=
Iterative Bregman projections for regularized transportation problems , author=. SIAM Journal on Scientific Computing , volume=. 2015 , publisher=
2015
-
[12]
The Sinkhorn algorithm, parabolic optimal transport and geometric Monge--Amp
Berman, Robert J , journal=. The Sinkhorn algorithm, parabolic optimal transport and geometric Monge--Amp. 2020 , publisher=
2020
-
[13]
doi:10.1016/j.rico.2024.100383 , url =
New Auction Algorithms for the Assignment Problem and Extensions , author =. doi:10.1016/j.rico.2024.100383 , url =
-
[14]
Journal of the ACM , volume=
Near-Linear Runtime for a Classical Matrix Preconditioning Algorithm , author=. Journal of the ACM , volume=. 2026 , publisher=
2026
-
[15]
arXiv preprint arXiv:2505.18005 , year=
Distances for Markov chains from sample streams , author=. arXiv preprint arXiv:2505.18005 , year=
-
[16]
SIAM Journal on Optimization , volume=
On the linear convergence of the multimarginal Sinkhorn algorithm , author=. SIAM Journal on Optimization , volume=. 2022 , publisher=
2022
-
[17]
SIAM Journal on Mathematics of Data Science , volume=
Accelerated Bregman primal-dual methods applied to optimal transport and Wasserstein barycenter problems , author=. SIAM Journal on Mathematics of Data Science , volume=. 2022 , publisher=
2022
-
[18]
SIAM Journal on Applied Mathematics , volume=
Entropic and displacement interpolation: a computational approach using the Hilbert metric , author=. SIAM Journal on Applied Mathematics , volume=. 2016 , publisher=
2016
-
[19]
Provably convergent schr
Chen, Yu and Deng, Wei and Fang, Shikai and Li, Fengpei and Yang, Nicole Tianjiao and Zhang, Yikai and Rasul, Kashif and Zhe, Shandian and Schneider, Anderson and Nevmyvaka, Yuriy , booktitle=. Provably convergent schr. 2023 , organization=
2023
-
[20]
arXiv preprint arXiv:2412.09235 , year=
A semiconcavity approach to stability of entropic plans and exponential convergence of Sinkhorn's algorithm , author=. arXiv preprint arXiv:2412.09235 , year=
-
[21]
Advances in Neural Information Processing Systems , volume=
Computational guarantees for doubly entropic Wasserstein barycenters , author=. Advances in Neural Information Processing Systems , volume=
-
[22]
arXiv preprint arXiv:2408.11620 , year=
Annealed sinkhorn for optimal transport: convergence, regularization path and debiasing , author=. arXiv preprint arXiv:2408.11620 , year=
-
[23]
Mathematical Programming , volume=
Sharper exponential convergence rates for Sinkhorn’s algorithm in continuous settings , author=. Mathematical Programming , volume=. 2026 , publisher=
2026
-
[24]
Mathematical Programming , volume=
Asymptotic analysis of the exponential penalty trajectory in linear programming , author=. Mathematical Programming , volume=. 1994 , publisher=
1994
-
[25]
Quantitative contraction rates for Sinkhorn's algorithm: beyond bounded costs and compact marginals
Quantitative contraction rates for Sinkhorn algorithm: beyond bounded costs and compact marginals , author=. arXiv preprint arXiv:2304.04451 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
Weak semiconvexity estimates for Schr
Conforti, Giovanni , journal=. Weak semiconvexity estimates for Schr. 2024 , publisher=
2024
-
[27]
SIAM Review , volume=
Semidual regularized optimal transport , author=. SIAM Review , volume=. 2018 , publisher=
2018
-
[28]
Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm
Wasserstein mirror gradient flow as the limit of the Sinkhorn algorithm , author=. arXiv preprint arXiv:2307.16421 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[29]
International Conference on Artificial Intelligence and Statistics , pages=
Nearly tight convergence bounds for semi-discrete entropic optimal transport , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=
2022
-
[30]
International conference on machine learning , pages=
Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm , author=. International conference on machine learning , pages=. 2018 , organization=
2018
-
[31]
SIAM Journal on Mathematical Analysis , volume=
Quantitative stability of regularized optimal transport and convergence of sinkhorn's algorithm , author=. SIAM Journal on Mathematical Analysis , volume=. 2022 , publisher=
2022
-
[32]
arXiv preprint arXiv:2502.20264 , year=
Exponential convergence of general iterative proportional fitting procedures , author=. arXiv preprint arXiv:2502.20264 , year=
-
[33]
Mathematical Proceedings of the Cambridge Philosophical Society , volume=
An elementary proof of the Birkhoff-Hopf theorem , author=. Mathematical Proceedings of the Cambridge Philosophical Society , volume=. 1995 , organization=
1995
-
[34]
Linear Algebra and its applications , volume=
On the scaling of multidimensional matrices , author=. Linear Algebra and its applications , volume=. 1989 , publisher=
1989
-
[35]
Handbook of convergence theorems for (stochastic) gradient methods , author=. arXiv preprint arXiv:2301.11235 , year=
-
[36]
arXiv preprint arXiv:2510.27340 , year=
Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport , author=. arXiv preprint arXiv:2510.27340 , year=
-
[37]
Advances in neural information processing systems , volume=
Stochastic optimization for large-scale optimal transport , author=. Advances in neural information processing systems , volume=
-
[38]
International Conference on Artificial Intelligence and Statistics , pages=
Learning generative models with sinkhorn divergences , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2018 , organization=
2018
-
[39]
Mathematics of Operations Research , year=
On the convergence rate of Sinkhorn’s algorithm , author=. Mathematics of Operations Research , year=
-
[40]
, author=
Non-asymptotic convergence bounds for Sinkhorn iterates and their gradients: a coupling approach. , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=
2023
-
[41]
arXiv preprint arXiv:2504.11133 , year=
Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity , author=. arXiv preprint arXiv:2504.11133 , year=
-
[42]
IEEE Transactions on Information Theory , volume=
A Bregman proximal perspective on classical and quantum Blahut-Arimoto algorithms , author=. IEEE Transactions on Information Theory , volume=. 2024 , publisher=
2024
-
[43]
Journal of machine learning research , volume=
On the convergence of projected alternating maximization for equitable and optimal transport , author=. Journal of machine learning research , volume=
-
[44]
A review of matrix scaling and Sinkhorn's normal form for matrices and positive maps
A review of matrix scaling and Sinkhorn's normal form for matrices and positive maps , author=. arXiv preprint arXiv:1609.06349 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
Mathematical Programming , volume=
On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , author=. Mathematical Programming , volume=. 2008 , publisher=
2008
-
[46]
International Conference on Artificial Intelligence and Statistics , pages=
Sinkhorn flow as mirror flow: A continuous-time framework for generalizing the sinkhorn algorithm , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=
2024
-
[47]
arXiv preprint arXiv:2307.08507 , year=
Efficient and accurate optimal transport with mirror descent and conjugate gradients , author=. arXiv preprint arXiv:2307.08507 , year=
-
[48]
Computer Methods in Applied Mechanics and Engineering , volume=
Efficient numerical strategies for entropy-regularized semi-discrete optimal transport , author=. Computer Methods in Applied Mechanics and Engineering , volume=. 2026 , publisher=
2026
-
[49]
arXiv preprint arXiv:2311.17296 , year=
Mirror duality in convex optimization , author=. arXiv preprint arXiv:2311.17296 , year=
-
[50]
SIAM Journal on Matrix Analysis and Applications , volume=
The Sinkhorn--Knopp algorithm: convergence and applications , author=. SIAM Journal on Matrix Analysis and Applications , volume=. 2008 , publisher=
2008
-
[51]
Neural networks , volume=
The invisible hand algorithm: Solving the assignment problem with statistical physics , author=. Neural networks , volume=. 1994 , publisher=
1994
-
[52]
arXiv preprint arXiv:2101.01704 , year=
The method of Bregman projections in deterministic and stochastic convex feasibility problems , author=. arXiv preprint arXiv:2101.01704 , year=
-
[53]
International Conference on Machine Learning , pages=
Batch Greenkhorn algorithm for entropic-regularized multimarginal optimal transport: Linear rate of convergence and iteration complexity , author=. International Conference on Machine Learning , pages=. 2022 , organization=
2022
-
[54]
Applied Mathematics & Optimization , volume=
A gradient descent perspective on Sinkhorn , author=. Applied Mathematics & Optimization , volume=. 2021 , publisher=
2021
-
[55]
arXiv preprint arXiv:2305.04917 , year=
Gradient descent with a general cost , author=. arXiv preprint arXiv:2305.04917 , year=
-
[56]
Optimization Letters , volume=
A note on overrelaxation in the Sinkhorn algorithm , author=. Optimization Letters , volume=. 2022 , publisher=
2022
-
[57]
2012 , publisher=
Nonlinear Perron-Frobenius Theory , author=. 2012 , publisher=
2012
-
[58]
SIAM Journal on Optimization , volume=
Fast computation of optimal transport via entropy-regularized extragradient methods , author=. SIAM Journal on Optimization , volume=. 2025 , publisher=
2025
-
[59]
arXiv preprint arXiv:2202.10042 , year=
Fast Sinkhorn I: An O (N) algorithm for the Wasserstein-1 metric , author=. arXiv preprint arXiv:2202.10042 , year=
-
[60]
Journal of Machine Learning Research , volume=
Importance sparsification for Sinkhorn algorithm , author=. Journal of Machine Learning Research , volume=
-
[61]
Journal of Machine Learning Research , volume=
On the efficiency of entropic regularized algorithms for optimal transport , author=. Journal of Machine Learning Research , volume=
-
[62]
Advances in neural information processing systems , volume=
Sinkhorn barycenters with free support via frank-wolfe algorithm , author=. Advances in neural information processing systems , volume=
-
[63]
International Conference on Artificial Intelligence and Statistics , pages=
Improved rate of first order algorithms for entropic optimal transport , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=
2023
-
[64]
arXiv preprint arXiv:2305.14939 , year=
Improved complexity analysis of the Sinkhorn and Greenkhorn algorithms for optimal transport , author=. arXiv preprint arXiv:2305.14939 , year=
-
[65]
An optimal transport approach for the Schr
Marino, Simone Di and Gerolin, Augusto , journal=. An optimal transport approach for the Schr. 2020 , publisher=
2020
-
[66]
arXiv preprint arXiv:2007.00976 , year=
Optimal transport losses and Sinkhorn algorithm with general convex regularization , author=. arXiv preprint arXiv:2007.00976 , year=
-
[67]
Information theory workshop , pages=
Information geometric formulation and interpretation of accelerated Blahut-Arimoto-type algorithms , author=. Information theory workshop , pages=. 2004 , organization=
2004
-
[68]
Advances in Neural Information Processing Systems , volume=
Online Sinkhorn: Optimal transport distances from sample streams , author=. Advances in Neural Information Processing Systems , volume=
-
[69]
arXiv preprint arXiv:1909.06918 , year=
Sinkhorn algorithm as a special case of stochastic mirror descent , author=. arXiv preprint arXiv:1909.06918 , year=
-
[70]
Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing , pages=
Geometrical interpretation and improvements of the Blahut-Arimoto's algorithm , author=. Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing , pages=
2009
-
[71]
Mathematics of Operations Research , volume=
Linear convergence of random dual coordinate descent on nonpolyhedral convex problems , author=. Mathematics of Operations Research , volume=. 2022 , publisher=
2022
-
[72]
arXiv preprint arXiv:2202.03618 , year=
On the convergence of gradient extrapolation methods for unbalanced optimal transport , author=. arXiv preprint arXiv:2202.03618 , year=
-
[73]
Streaming Sliced Optimal Transport
Streaming sliced optimal transport , author=. arXiv preprint arXiv:2505.06835 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[74]
Lecture notes, Columbia University , volume=
Introduction to entropic optimal transport , author=. Lecture notes, Columbia University , volume=
-
[75]
Foundations and Trends
Computational optimal transport with applications to data sciences , author=. Foundations and Trends. 2019 , publisher=
2019
-
[76]
arXiv preprint arXiv:2505.06589 , year=
Optimal transport for machine learners , author=. arXiv preprint arXiv:2505.06589 , year=
-
[77]
International Conference on Machine Learning , pages=
On unbalanced optimal transport: An analysis of Sinkhorn algorithm , author=. International Conference on Machine Learning , pages=. 2020 , organization=
2020
-
[78]
International Conference on Machine Learning , pages=
Minimax estimation of discontinuous optimal transport maps: The semi-discrete case , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[79]
International Conference on Machine Learning , pages=
Multisample Flow Matching: Straightening Flows with Minibatch Couplings , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[80]
Operations Research , year=
On Sinkhorn’s algorithm and choice modeling , author=. Operations Research , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.