pith. sign in

arxiv: 1907.01729 · v2 · pith:PTG7HH3Xnew · submitted 2019-07-01 · 📊 stat.ML · cs.LG

Implementation of batched Sinkhorn iterations for entropy-regularized Wasserstein loss

Pith reviewed 2026-05-25 11:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords entropy-regularized WassersteinSinkhorn iterationsPyTorchoptimal transportbatched computationmachine learning loss
0
0 comments X

The pith

A PyTorch implementation computes entropy-regularized Wasserstein loss via batched Sinkhorn iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the entropy-regularized Wasserstein distance introduced by Cuturi and supplies a working PyTorch code for its computation. It focuses on translating the Sinkhorn iterations into a batched form suitable for machine learning workloads. A sympathetic reader cares because Wasserstein-based losses appear in optimal transport tasks, yet practical code for them has been scattered. The report makes the method directly usable by providing the implementation alongside a notebook.

Core claim

The report reviews the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and documents a practical implementation in PyTorch.

What carries the argument

Batched Sinkhorn iterations that solve the entropy-regularized optimal transport problem between pairs of distributions.

If this is right

  • Multiple sample pairs can be processed in a single forward pass, reducing overhead in training loops.
  • The loss becomes available as a drop-in component inside existing PyTorch models for tasks such as generative modeling.
  • Users obtain a concrete reference point for verifying custom re-implementations of the same regularized distance.
  • The code supports direct experimentation with the entropy regularization parameter on real data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The notebook could serve as a starting template for porting the same algorithm to other automatic-differentiation frameworks.
  • Once integrated, the loss enables direct comparisons of transport-based objectives against standard divergence measures on the same datasets.
  • The implementation invites tests on whether the batched version preserves the same convergence behavior as the scalar version for large batch sizes.

Load-bearing premise

The original Sinkhorn iterations translate directly into stable batched PyTorch code without further numerical safeguards.

What would settle it

Executing the notebook on standard uniform distributions and checking whether the returned loss values match those from an independent reference implementation of the same algorithm.

read the original abstract

In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript reviews the entropy-regularized Wasserstein loss introduced by Cuturi and documents a practical PyTorch implementation of batched Sinkhorn iterations, with code provided in an accompanying notebook.

Significance. A correct and numerically stable batched implementation would be useful for PyTorch users applying optimal transport losses in machine learning, as it directly translates a known algorithm into a common framework. The public code link is a strength for reproducibility.

major comments (1)
  1. [Implementation section / notebook] The implementation description provides no indication of log-domain stabilization (e.g., log-sum-exp) or other guards against overflow/underflow in the u/v scaling vector updates. This is load-bearing for the central claim of a practical implementation, as standard primal Sinkhorn iterations are known to be unstable for small epsilon or large dynamic range in the cost matrix (see Cuturi 2013, §3).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting an important aspect of numerical stability in the Sinkhorn algorithm. We address the single major comment below.

read point-by-point responses
  1. Referee: [Implementation section / notebook] The implementation description provides no indication of log-domain stabilization (e.g., log-sum-exp) or other guards against overflow/underflow in the u/v scaling vector updates. This is load-bearing for the central claim of a practical implementation, as standard primal Sinkhorn iterations are known to be unstable for small epsilon or large dynamic range in the cost matrix (see Cuturi 2013, §3).

    Authors: We agree that the absence of any discussion of numerical stabilization weakens the manuscript's claim of documenting a 'practical' implementation. The notebook code performs the standard primal updates in the linear domain without explicit log-sum-exp or other guards, which can indeed lead to overflow for small epsilon. We will revise the manuscript to add a short subsection on numerical considerations (including the known limitations of the provided code and references to stabilized variants) and will update the notebook with an optional log-domain path. This constitutes a major revision to the text and code. revision: yes

Circularity Check

0 steps flagged

No circularity: direct implementation of external prior work (Cuturi)

full rationale

The paper is an implementation report that reviews the entropy-regularized Wasserstein loss from Cuturi (external citation) and provides PyTorch code for batched Sinkhorn iterations. No new derivations, fitted parameters, self-citations, or ansatzes are introduced. The central content is a translation of published prior work into code, with no load-bearing steps that reduce to the paper's own inputs by construction. This matches the default non-circular case for implementation notes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new scientific claims, parameters, axioms, or entities are introduced; this is a documentation report of existing work.

pith-pipeline@v0.9.0 · 5555 in / 916 out tokens · 31464 ms · 2026-05-25T11:44:59.820245+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 2 internal anchors

  1. [1]

    Wasserstein GAN

    M. Arjovsky et al., Wasserstein GAN, arXiv 1701.07875

  2. [2]

    Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013

    M. Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013

  3. [3]

    Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019

    D. Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019

  4. [4]

    Computational optima l transport

    G. Peyré and M. Cuturi, Computational Optimal Transport, arXiv 1803.00567 (v3)

  5. [5]

    Franklin and J

    J. Franklin and J. Lorenz, On the Scaling of Multidimensional Matrices, Linear algebra and its applications, 114/115 (1989)

  6. [6]

    Frogner et al., Learning with a Wasserstein Loss, NIPS 2015

    C. Frogner et al., Learning with a Wasserstein Loss, NIPS 2015

  7. [7]

    Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017

    I. Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017

  8. [8]

    Luise et al., Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018

    G. Luise et al., Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018

  9. [9]

    Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018

    T. Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018. 5

  10. [10]

    Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997

    Y. Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997

  11. [11]

    Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems

    B. Schmitzer, Stabilized sparse scaling algorithms for entropy regularized transport problems, arXiv 1610.06519

  12. [12]

    T. Viehmann, Batch Sinkhorn Iteration Wasserstein Distance, PyTorch code and notebook, 2017, https://github.com/t-vi/pytorch-tvmisc/blob/ae4d945 97751f98d4a0d7b10188dd02c13a0c6fd/wasserstein-distance/Pytorch_Wasserstein.ipynb

  13. [13]

    Villani, Optimal Transport - Old and New, Springer, 2009

    C. Villani, Optimal Transport - Old and New, Springer, 2009. 6