pith. sign in

arxiv: 2410.14104 · v2 · submitted 2024-10-18 · 🧮 math.OC · cs.NA· math.NA

Accelerating operator Sinkhorn iteration with overrelaxation

Pith reviewed 2026-05-23 18:59 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA
keywords operator scalingSinkhorn iterationsuccessive overrelaxationHilbert metricpositive definite operatorsconvergence analysis
0
0 comments X

The pith

Successive overrelaxation accelerates the operator Sinkhorn iteration for scaling positive definite operators, with local rate optimization via linearization and global convergence via the Hilbert metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops accelerated versions of the operator Sinkhorn iteration by incorporating successive overrelaxation. Local convergence is analyzed by linearizing the iteration around its fixed point, which identifies the best relaxation parameter according to Young's theorem on successive overrelaxation. A geodesic form of the overrelaxed iteration is shown to converge globally on the cone of positive definite operators for relaxation parameters in a specific interval, using contraction properties of the Hilbert metric. These results extend analogous acceleration methods that were previously known only for ordinary matrix scaling. Numerical tests confirm faster performance than the standard iteration in selected applications.

Core claim

Applying successive overrelaxation to the operator Sinkhorn map yields accelerated iterations whose asymptotic convergence rate is governed by the spectral radius of the linearized map at the fixed point; the optimal parameter follows from Young's SOR theorem. In addition, a geodesic overrelaxation variant converges globally whenever the relaxation parameter lies between one and two, because the Hilbert metric contracts distances on the positive definite cone under that condition.

What carries the argument

The overrelaxed operator Sinkhorn iteration map, whose local behavior is captured by its Fréchet derivative at the fixed point and whose global behavior is controlled by the Hilbert metric on the positive definite cone.

Load-bearing premise

The linearization of the iteration map at the fixed point correctly predicts the long-term convergence speed, and the geodesic overrelaxation iteration remains contractive in the Hilbert metric for the chosen parameter values.

What would settle it

An explicit computation of the convergence rate for a low-dimensional operator scaling problem where the observed rate after applying the optimal relaxation parameter differs from the rate given by the spectral radius of the linearized map.

Figures

Figures reproduced from arXiv: 2410.14104 by Andr\'e Uschmajew, Tasuku Soma.

Figure 1
Figure 1. Figure 1: Experimental results for frame scaling. ( [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experimental results for ill-conditioned operators. ( [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
read the original abstract

We propose accelerated versions of the operator Sinkhorn iteration for operator scaling using successive overrelaxation. We analyze the local convergence rates of these accelerated methods via linearization, which allows us to determine the asymptotically optimal relaxation parameter based on Young's SOR theorem. Using the Hilbert metric on positive definite cones, we also obtain a global convergence result for a geodesic version of overrelaxation in a specific range of relaxation parameters. These techniques generalize corresponding results obtained for matrix scaling by Thibault et al. (Algorithms, 14(5):143, 2021) and Lehmann et al. (Optim. Lett., 16(8):2209--2220, 2022). Numerical experiments demonstrate that the proposed methods outperform the original operator Sinkhorn iteration in certain applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes accelerated versions of the operator Sinkhorn iteration for operator scaling problems by incorporating successive overrelaxation (SOR). Local convergence rates of the accelerated iterations are analyzed by linearizing the nonlinear map at the fixed point and invoking Young's SOR theorem to identify the asymptotically optimal relaxation parameter. A global convergence guarantee is derived for a geodesic variant of overrelaxation via the Hilbert metric on the cone of positive definite operators, valid for a specific parameter range. The results generalize corresponding matrix-scaling accelerations from Thibault et al. (2021) and Lehmann et al. (2022). Numerical experiments indicate faster performance than the unaccelerated operator Sinkhorn iteration in selected applications.

Significance. If the central claims hold, the manuscript supplies a practical and theoretically supported acceleration technique for operator scaling, extending finite-matrix SOR results to the infinite-dimensional operator setting while retaining both local asymptotic rates and global convergence via the Hilbert metric. The explicit use of linearization to recover an optimal relaxation parameter and the geodesic formulation are technically natural extensions; the numerical validation provides empirical evidence of utility. Such accelerations could benefit applications involving positive semidefinite operators, such as quantum marginal problems or semidefinite programming.

major comments (1)
  1. [Local convergence analysis] Local convergence section: the derivation of the asymptotically optimal relaxation parameter rests on linearizing the operator Sinkhorn map to obtain a linear operator L and then directly applying Young's SOR theorem. The manuscript does not verify that L satisfies the structural hypotheses of the theorem (consistent ordering, property A, or equivalent spectral conditions) when L acts on the space of positive definite operators; these conditions hold automatically for finite matrices but require separate justification in the operator setting. Without this verification the claimed optimality of the derived ω* does not follow.
minor comments (2)
  1. [Abstract] The abstract states that the methods 'outperform the original operator Sinkhorn iteration in certain applications'; naming the concrete applications (e.g., quantum information, covariance estimation) would improve readability.
  2. [Introduction] Notation for the relaxation parameter ω and the geodesic step should be introduced with a short reminder of the matrix-case definitions before the operator generalization is stated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below.

read point-by-point responses
  1. Referee: [Local convergence analysis] Local convergence section: the derivation of the asymptotically optimal relaxation parameter rests on linearizing the operator Sinkhorn map to obtain a linear operator L and then directly applying Young's SOR theorem. The manuscript does not verify that L satisfies the structural hypotheses of the theorem (consistent ordering, property A, or equivalent spectral conditions) when L acts on the space of positive definite operators; these conditions hold automatically for finite matrices but require separate justification in the operator setting. Without this verification the claimed optimality of the derived ω* does not follow.

    Authors: We agree that the manuscript does not explicitly verify that the linearized operator L satisfies the structural hypotheses (property A, consistent ordering, or equivalent spectral conditions) required by Young's SOR theorem in the infinite-dimensional setting. This verification is indeed needed to rigorously justify the optimality of ω*. In the revised manuscript we will add a dedicated paragraph (or short appendix) establishing that L inherits these properties from the structure of the operator Sinkhorn map: the linearization is a positive linear operator on the tangent space that preserves the block-nonnegativity pattern induced by the scaling, allowing the same combinatorial arguments used in the matrix case to carry over. This will confirm applicability of the theorem without altering the stated convergence rates. revision: yes

Circularity Check

0 steps flagged

No circularity; relies on external theorems and independent prior matrix results

full rationale

The paper linearizes the operator Sinkhorn map to obtain a linear operator and invokes Young's SOR theorem (a classical external result) to select the optimal relaxation parameter; it also uses the Hilbert metric for a global convergence argument in a stated parameter range. These steps generalize independent results from Thibault et al. (2021) and Lehmann et al. (2022) on the matrix case without self-citation chains or load-bearing internal citations. No parameters are fitted to data and renamed as predictions, no quantities are defined in terms of themselves, and no ansatz is smuggled via self-citation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The abstract invokes standard domain assumptions from optimization and geometry; no new entities are introduced and the relaxation parameter is derived from classical SOR theory rather than fitted ad hoc.

free parameters (1)
  • relaxation parameter
    Chosen via Young's SOR theorem applied to the linearized map; its specific value depends on spectral properties of the iteration.
axioms (1)
  • domain assumption The Hilbert metric on the positive definite cone supports a global convergence argument for geodesic overrelaxation within a stated parameter interval
    Invoked to obtain the global convergence result

pith-pipeline@v0.9.0 · 5660 in / 1209 out tokens · 30801 ms · 2026-05-23T18:59:13.389851+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 2 internal anchors

  1. [1]

    Allen-Zhu, A

    Z. Allen-Zhu, A. Garg, Y. Li, R. Oliveira, and A. Wigderson. Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages 172--181. Association for Computing Machinery, New York, 2018

  2. [2]

    Ba c \'ak

    M. Ba c \'ak. Convex analysis and optimization in H adamard spaces . De Gruyter, Berlin, 2014

  3. [3]

    F. Barthe. On a reverse form of the B rascamp- L ieb inequality. Invent. Math. , 134(2):335--361, 1998

  4. [4]

    R. Bhatia. Positive definite matrices . Princeton University Press, Princeton, NJ, 2007

  5. [5]

    Burer and R

    S. Burer and R. D. C. Monteiro. Local minima and convergence in low-rank semidefinite programming. Math. Program. , 103(3, Ser. A):427--444, 2005

  6. [6]

    N. Boumal. An introduction to optimization on smooth manifolds . Cambridge University Press, Cambridge, 2023

  7. [7]

    Boyd and L

    S. Boyd and L. Vandenberghe. Convex optimization . Cambridge University Press, Cambridge, 2004

  8. [8]

    Drton, S

    M. Drton, S. Kuriki, and P. Hoff. Existence and uniqueness of the K ronecker covariance MLE . Ann. Statist. , 49(5):2721--2754, 2021

  9. [9]

    Geodesic Convexity and Regularized Scatter Estimators

    L. Duembgen and D. E. Tyler. Geodesic convexity and regularized scatter estimators. arxiv:1607.05455 , 2016

  10. [10]

    S. P. Eveson and R. D. Nussbaum. An elementary proof of the B irkhoff- H opf theorem. Math. Proc. Cambridge Philos. Soc. , 117(1):31--55, 1995

  11. [11]

    Franklin and J

    J. Franklin and J. Lorenz. On the scaling of multidimensional matrices. Linear Algebra Appl. , 114/115:717--735, 1989

  12. [12]

    W. C. Franks and A. Moitra. Rigorous guarantees for Tyler ’s M -estimator via quantum expansion. In Proceedings of the 33rd Conference on Learning Theory , pages 1601--1632. PMLR, 2020

  13. [13]

    A. Garg, L. Gurvits, R. Oliveira, and A. Wigderson. Algorithmic and optimization aspects of B rascamp- L ieb inequalities, via operator scaling. Geom. Funct. Anal. , 28(1):100--145, 2018

  14. [14]

    A. Garg, L. Gurvits, R. Oliveira, and A. Wigderson. Operator scaling: theory and applications. Found. Comput. Math. , 20(2):223--290, 2020

  15. [15]

    T. T. Georgiou and M. Pavon. Positive contraction mappings for classical and quantum S chr\"odinger systems. J. Math. Phys. , 56(3):033301, 24, 2015

  16. [16]

    L. Gurvits. Classical complexity and quantum entanglement. J. Comput. System Sci. , 69(3):448--484, 2004

  17. [17]

    Gunawardena and C

    J. Gunawardena and C. Walsh. Iterates of maps which are non-expansive in H ilbert's projective metric. Kybernetika (Prague) , 39(2):193--204, 2003

  18. [18]

    Hackbusch

    W. Hackbusch. Iterative solution of large sparse systems of equations . Springer, Cham, second edition, 2016

  19. [19]

    N. J. Higham. Functions of matrices . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008

  20. [20]

    L. A. Hageman and T. A. Porsching. Aspects of nonlinear block successive overrelaxation. SIAM J. Numer. Anal. , 12:316--335, 1975

  21. [21]

    R. B. Holmes and V. I. Paulsen. Optimal frames for erasures. Linear Algebra Appl. , 377:31--51, 2004

  22. [22]

    M. Idel. A review of matrix scaling and S inkhorn's normal form for matrices and positive maps. arxiv:1609.06349 , 2016

  23. [23]

    T. C. Kwok, L. C. Lau, Y. T. Lee, and A. Ramachandran. The paulsen problem, continuous operator scaling, and smoothed analysis. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages 182--189. Association for Computing Machinery, New York, 2018

  24. [24]

    Lemmens and R

    B. Lemmens and R. Nussbaum. Nonlinear P erron- F robenius theory . Cambridge University Press, Cambridge, 2012

  25. [25]

    Lehmann, M.-K

    T. Lehmann, M.-K. von Renesse, A. Sambale, and A. Uschmajew. A note on overrelaxation in the S inkhorn algorithm. Optim. Lett. , 16(8):2209--2220, 2022

  26. [26]

    J. M. Ortega and W. C. Rheinboldt. Iterative solution of nonlinear equations in several variables . Academic Press, New York-London, 1970

  27. [27]

    I. V. Oseledets, M. V. Rakhuba, and A. Uschmajew. Local convergence of alternating low-rank optimization methods with overrelaxation. Numer. Linear Algebra Appl. , 30(3):e2459, 15, 2023

  28. [28]

    Peyr\'e, L

    G. Peyr\'e, L. Chizat, F.-X. Vialard, and J. Solomon. Quantum entropic regularization of matrix-valued optimal transport. European J. Appl. Math. , 30(6):1079--1102, 2019

  29. [29]

    Schechter

    S. Schechter. Iteration methods for nonlinear problems. Trans. Amer. Math. Soc. , 104:179--189, 1962

  30. [30]

    Sra and R

    S. Sra and R. Hosseini. Geometric optimisation on positive definite matrices for elliptically contoured distributions. In Advances in Neural Information Processing Systems 26 , pages 2562--2570. Curran Associates, Inc., 2013

  31. [31]

    Sra and R

    S. Sra and R. Hosseini. Conic geometric optimization on the manifold of positive definite matrices. SIAM Journal on Optimization , 25(1):713--739, 2015

  32. [32]

    Sinkhorn and P

    R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific J. Math. , 21:343--348, 1967

  33. [33]

    S. Sra, N. K. Vishnoi, and O. Yildiz. On geodesically convex formulations for the Brascamp-Lieb constant. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018) , pages 25:1--25:15. Schloss Dagstuhl -- Leibniz-Zentrum f \"u r Informatik, Dagstuhl, 2018

  34. [34]

    Thibault, L

    A. Thibault, L. Chizat, C. Dossal, and N. Papadakis. Overrelaxed S inkhorn- K nopp algorithm for regularized optimal transport. Algorithms (Basel) , 14(5):Paper No. 143, 16, 2021

  35. [35]

    D. E. Tyler. A distribution-free M -estimator of multivariate scatter. Ann. Statist. , 15(1):234--251, 1987

  36. [36]

    Udri s te

    C. Udri s te. Convex functions and optimization methods on R iemannian manifolds . Kluwer Academic Publishers Group, Dordrecht, 1994

  37. [37]

    Weber and S

    M. Weber and S. Sra. Global optimality for E uclidean CCCP under R iemannian convexity. In Proceedings of the 40th International Conference on Machine Learning , pages 36790--36803. PMLR, 2023