Implementation of batched Sinkhorn iterations for entropy-regularized Wasserstein loss

Thomas Viehmann

arxiv: 1907.01729 · v2 · pith:PTG7HH3Xnew · submitted 2019-07-01 · 📊 stat.ML · cs.LG

Implementation of batched Sinkhorn iterations for entropy-regularized Wasserstein loss

Thomas Viehmann This is my paper

Pith reviewed 2026-05-25 11:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords entropy-regularized WassersteinSinkhorn iterationsPyTorchoptimal transportbatched computationmachine learning loss

0 comments

The pith

A PyTorch implementation computes entropy-regularized Wasserstein loss via batched Sinkhorn iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the entropy-regularized Wasserstein distance introduced by Cuturi and supplies a working PyTorch code for its computation. It focuses on translating the Sinkhorn iterations into a batched form suitable for machine learning workloads. A sympathetic reader cares because Wasserstein-based losses appear in optimal transport tasks, yet practical code for them has been scattered. The report makes the method directly usable by providing the implementation alongside a notebook.

Core claim

The report reviews the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and documents a practical implementation in PyTorch.

What carries the argument

Batched Sinkhorn iterations that solve the entropy-regularized optimal transport problem between pairs of distributions.

If this is right

Multiple sample pairs can be processed in a single forward pass, reducing overhead in training loops.
The loss becomes available as a drop-in component inside existing PyTorch models for tasks such as generative modeling.
Users obtain a concrete reference point for verifying custom re-implementations of the same regularized distance.
The code supports direct experimentation with the entropy regularization parameter on real data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The notebook could serve as a starting template for porting the same algorithm to other automatic-differentiation frameworks.
Once integrated, the loss enables direct comparisons of transport-based objectives against standard divergence measures on the same datasets.
The implementation invites tests on whether the batched version preserves the same convergence behavior as the scalar version for large batch sizes.

Load-bearing premise

The original Sinkhorn iterations translate directly into stable batched PyTorch code without further numerical safeguards.

What would settle it

Executing the notebook on standard uniform distributions and checking whether the returned loss values match those from an independent reference implementation of the same algorithm.

read the original abstract

In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a code notebook reimplementing Cuturi's 2013 Sinkhorn method in PyTorch with no new results or analysis.

read the letter

This report walks through the entropy-regularized Wasserstein loss from Cuturi and supplies a PyTorch notebook for batched Sinkhorn iterations. Nothing in it is new; the math and algorithm are taken directly from the 2013 paper and the contribution is the code link plus a short review of the steps. The implementation may be convenient for someone who needs this loss function in a PyTorch model and does not want to write the matrix multiplies themselves. The citation pattern is clean and points straight to the original source without unnecessary self-reference. The main limitation is that the description gives no sign of log-domain stabilization or other guards against overflow and underflow. Standard Sinkhorn scaling can produce NaNs or infinities for small epsilon or ill-conditioned cost matrices, and batching does not remove that risk. No numerical checks or edge-case tests are reported, so users would still need to verify stability themselves. The work is aimed at practitioners who want a ready-made implementation rather than at researchers looking for new methods or guarantees. It does not contain enough substance or novelty to justify sending it out for peer review.

Referee Report

1 major / 0 minor

Summary. The manuscript reviews the entropy-regularized Wasserstein loss introduced by Cuturi and documents a practical PyTorch implementation of batched Sinkhorn iterations, with code provided in an accompanying notebook.

Significance. A correct and numerically stable batched implementation would be useful for PyTorch users applying optimal transport losses in machine learning, as it directly translates a known algorithm into a common framework. The public code link is a strength for reproducibility.

major comments (1)

[Implementation section / notebook] The implementation description provides no indication of log-domain stabilization (e.g., log-sum-exp) or other guards against overflow/underflow in the u/v scaling vector updates. This is load-bearing for the central claim of a practical implementation, as standard primal Sinkhorn iterations are known to be unstable for small epsilon or large dynamic range in the cost matrix (see Cuturi 2013, §3).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting an important aspect of numerical stability in the Sinkhorn algorithm. We address the single major comment below.

read point-by-point responses

Referee: [Implementation section / notebook] The implementation description provides no indication of log-domain stabilization (e.g., log-sum-exp) or other guards against overflow/underflow in the u/v scaling vector updates. This is load-bearing for the central claim of a practical implementation, as standard primal Sinkhorn iterations are known to be unstable for small epsilon or large dynamic range in the cost matrix (see Cuturi 2013, §3).

Authors: We agree that the absence of any discussion of numerical stabilization weakens the manuscript's claim of documenting a 'practical' implementation. The notebook code performs the standard primal updates in the linear domain without explicit log-sum-exp or other guards, which can indeed lead to overflow for small epsilon. We will revise the manuscript to add a short subsection on numerical considerations (including the known limitations of the provided code and references to stabilized variants) and will update the notebook with an optional log-domain path. This constitutes a major revision to the text and code. revision: yes

Circularity Check

0 steps flagged

No circularity: direct implementation of external prior work (Cuturi)

full rationale

The paper is an implementation report that reviews the entropy-regularized Wasserstein loss from Cuturi (external citation) and provides PyTorch code for batched Sinkhorn iterations. No new derivations, fitted parameters, self-citations, or ansatzes are introduced. The central content is a translation of published prior work into code, with no load-bearing steps that reduce to the paper's own inputs by construction. This matches the default non-circular case for implementation notes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new scientific claims, parameters, axioms, or entities are introduced; this is a documentation report of existing work.

pith-pipeline@v0.9.0 · 5555 in / 916 out tokens · 31464 ms · 2026-05-25T11:44:59.820245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 2 internal anchors

[1]

Wasserstein GAN

M. Arjovsky et al., Wasserstein GAN, arXiv 1701.07875

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013

M. Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013

work page 2013
[3]

Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019

D. Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019

work page 2019
[4]

Computational optima l transport

G. Peyré and M. Cuturi, Computational Optimal Transport, arXiv 1803.00567 (v3)

work page arXiv
[5]

Franklin and J

J. Franklin and J. Lorenz, On the Scaling of Multidimensional Matrices, Linear algebra and its applications, 114/115 (1989)

work page 1989
[6]

Frogner et al., Learning with a Wasserstein Loss, NIPS 2015

C. Frogner et al., Learning with a Wasserstein Loss, NIPS 2015

work page 2015
[7]

Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017

I. Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017

work page 2017
[8]

Luise et al., Diﬀerential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018

G. Luise et al., Diﬀerential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018

work page 2018
[9]

Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018

T. Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018. 5

work page 2018
[10]

Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997

Y. Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997

work page 1997
[11]

Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems

B. Schmitzer, Stabilized sparse scaling algorithms for entropy regularized transport problems, arXiv 1610.06519

work page internal anchor Pith review Pith/arXiv arXiv
[12]

T. Viehmann, Batch Sinkhorn Iteration Wasserstein Distance, PyTorch code and notebook, 2017, https://github.com/t-vi/pytorch-tvmisc/blob/ae4d945 97751f98d4a0d7b10188dd02c13a0c6fd/wasserstein-distance/Pytorch_Wasserstein.ipynb

work page 2017
[13]

Villani, Optimal Transport - Old and New, Springer, 2009

C. Villani, Optimal Transport - Old and New, Springer, 2009. 6

work page 2009

[1] [1]

Wasserstein GAN

M. Arjovsky et al., Wasserstein GAN, arXiv 1701.07875

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013

M. Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013

work page 2013

[3] [3]

Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019

D. Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019

work page 2019

[4] [4]

Computational optima l transport

G. Peyré and M. Cuturi, Computational Optimal Transport, arXiv 1803.00567 (v3)

work page arXiv

[5] [5]

Franklin and J

J. Franklin and J. Lorenz, On the Scaling of Multidimensional Matrices, Linear algebra and its applications, 114/115 (1989)

work page 1989

[6] [6]

Frogner et al., Learning with a Wasserstein Loss, NIPS 2015

C. Frogner et al., Learning with a Wasserstein Loss, NIPS 2015

work page 2015

[7] [7]

Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017

I. Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017

work page 2017

[8] [8]

Luise et al., Diﬀerential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018

G. Luise et al., Diﬀerential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018

work page 2018

[9] [9]

Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018

T. Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018. 5

work page 2018

[10] [10]

Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997

Y. Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997

work page 1997

[11] [11]

Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems

B. Schmitzer, Stabilized sparse scaling algorithms for entropy regularized transport problems, arXiv 1610.06519

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

T. Viehmann, Batch Sinkhorn Iteration Wasserstein Distance, PyTorch code and notebook, 2017, https://github.com/t-vi/pytorch-tvmisc/blob/ae4d945 97751f98d4a0d7b10188dd02c13a0c6fd/wasserstein-distance/Pytorch_Wasserstein.ipynb

work page 2017

[13] [13]

Villani, Optimal Transport - Old and New, Springer, 2009

C. Villani, Optimal Transport - Old and New, Springer, 2009. 6

work page 2009