Minkowski-Type Wasserstein Metrics and Barycenters for Location-Scale Mixtures with Application to Domain Adaptation

Songyan Luo; Yunxin Zhang

arxiv: 2606.26482 · v1 · pith:GBGZT5LSnew · submitted 2026-06-25 · 🧮 math.OC

Minkowski-Type Wasserstein Metrics and Barycenters for Location-Scale Mixtures with Application to Domain Adaptation

Songyan Luo , Yunxin Zhang This is my paper

Pith reviewed 2026-06-26 04:33 UTC · model grok-4.3

classification 🧮 math.OC

keywords optimal transportlocation-scale mixturesWasserstein distancemixture modelsdomain adaptationbarycenterscomputational efficiencymultimarginal transport

0 comments

The pith

Optimal transport between location-scale mixture models reduces to discrete transport over components with linear scaling in sample size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework for optimal transport between finite location-scale mixture models by defining Minkowski-type Wasserstein metrics based on generalized Minkowski inequalities. It characterizes the optimal transport maps for these families and extends the metrics and barycenters under the assumption that the mixtures are identifiable. The central technical step shows that a specific mixture structure on the joint couplings converts the continuous multimarginal optimal transport problem into a discrete transport problem on the mixture components. This change yields computational costs that scale linearly rather than quadratically with the number of samples. The resulting method is tested on a domain adaptation benchmark where it achieves comparable accuracy to standard empirical optimal transport at much lower computational cost.

Core claim

By restricting the joint couplings to a specific mixture structure, the continuous multimarginal optimal transport problem between finite location-scale mixture models reduces to a discrete transport problem over the mixture components. This reduction, combined with the characterization of transport maps via generalized Minkowski inequalities, allows the definition of Wasserstein-type metrics and barycenters for these models.

What carries the argument

Minkowski-type Wasserstein metrics defined on the class of functions satisfying generalized Minkowski inequalities, which enable the extension of optimal transport to identifiable finite location-scale mixture models.

If this is right

Transport plans are computed between mixture components instead of individual data points.
Computational complexity of optimal transport scales linearly with sample size.
Wasserstein barycenters can be defined and computed for location-scale mixture models.
The approach applies to domain adaptation with reduced memory and time requirements while maintaining accuracy.
Multimarginal optimal transport problems become tractable under the mixture coupling restriction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar reductions might apply to other parametric families if appropriate coupling structures can be identified.
The linear scaling opens the possibility of using these metrics in large-scale machine learning pipelines where quadratic costs were prohibitive.
Domain adaptation benefits suggest potential use in other distribution-matching tasks such as style transfer or data augmentation.
Verification of identifiability in practice may require additional regularization or model selection steps.

Load-bearing premise

The finite location-scale mixture models are identifiable.

What would settle it

Demonstrate two different identifiable mixtures that produce identical distributions but yield different values under the proposed Minkowski-type Wasserstein metric, or show that the restricted couplings fail to approximate the true optimal transport cost.

read the original abstract

Discrete optimal transport (OT) typically relies on pointwise matching between empirical measures, incurring computational costs that scale at least quadratically with the sample size. To circumvent this limitation, we introduce a mathematical framework for OT between finite location-scale mixture models. By defining a specific function class grounded in generalized Minkowski inequalities and characterizing OT maps between multivariate location-scale families, we extend Wasserstein-type metrics and barycenters to these mixture models under the assumption of identifiability. Furthermore, we prove that restricting joint couplings to a specific mixture structure reduces the continuous multimarginal OT problem to a discrete transport problem over mixture components. Computing transport plans between these components rather than individual samples reduces the computational complexity to linear scaling with respect to the sample size. Empirical evaluations on the VisDA-C benchmark confirm that this strategy achieves competitive accuracy compared to existing empirical OT approaches, while substantially reducing the computational cost and memory footprint.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a structured reduction that turns multimarginal OT on location-scale mixtures into discrete component transport for linear scaling, but everything rests on an identifiability assumption whose sufficient conditions are not supplied.

read the letter

The colleague should know two things up front. First, the authors define Minkowski-type Wasserstein metrics and barycenters for finite location-scale mixtures by using generalized Minkowski inequalities and characterizing maps between those families. Second, they claim that restricting couplings to a mixture structure converts the continuous multimarginal problem into ordinary discrete transport over the components, which drops the cost from quadratic to linear in sample size.

The reduction itself is the part that could matter. If the math works, it gives a practical route for domain adaptation tasks where the data are already modeled as mixtures. The VisDA-C experiments show competitive accuracy with lower memory and runtime than standard empirical OT, which is concrete evidence that the approach is at least worth testing.

The soft spot is identifiability. The abstract conditions the whole framework and the OT-map uniqueness on it, yet supplies no theorem stating when finite multivariate location-scale mixtures remain identifiable. Without separation conditions on means or bounds on covariances, the restricted couplings may not recover the true optimum and the claimed equivalence may fail. That gap is load-bearing.

The paper is for researchers who already work with mixture models inside optimal transport or domain adaptation pipelines. A reader who needs faster structured OT would get direct value from the complexity claim and the benchmark numbers, provided the proofs hold.

I would send this to peer review. The claims are specific enough that referees can check the derivations and the identifiability gap directly.

Referee Report

2 major / 1 minor

Summary. The paper introduces a framework for optimal transport (OT) between finite location-scale mixture models. It defines Minkowski-type Wasserstein metrics and barycenters by characterizing OT maps between multivariate location-scale families via generalized Minkowski inequalities, under an identifiability assumption. The central technical claim is that restricting joint couplings to a specific mixture structure reduces the continuous multimarginal OT problem to a discrete transport problem over mixture components, yielding linear scaling in sample size. The approach is applied to domain adaptation and evaluated empirically on the VisDA-C benchmark, reporting competitive accuracy with reduced computational cost and memory usage compared to empirical OT methods.

Significance. If the reduction and metric extensions hold under the stated assumptions, the work provides a structured way to achieve linear-complexity OT for mixture models, which could meaningfully advance scalable OT methods in high-dimensional settings such as domain adaptation. The explicit use of mixture structure to discretize the multimarginal problem is a concrete contribution, and the empirical demonstration of efficiency gains on a standard benchmark adds practical value. No machine-checked proofs or fully parameter-free derivations are described in the available material.

major comments (2)

[Abstract] Abstract: The reduction of the multimarginal OT problem to discrete transport over mixture components (with claimed linear scaling in sample size) and the extension of Wasserstein metrics both explicitly require identifiability of the finite location-scale mixtures to ensure unique OT maps between components. No theorem or set of sufficient conditions (e.g., minimum mean separation, covariance bounds, or dimension-dependent criteria) is provided to guarantee identifiability in the multivariate setting used for the domain-adaptation application.
[Abstract] Abstract: Without verified identifiability, the restricted couplings may fail to achieve the true multimarginal optimum, undermining the equivalence between the continuous OT problem and the discrete component-wise transport. This assumption is load-bearing for both the theoretical claims and the complexity reduction.

minor comments (1)

The abstract mentions 'generalized Minkowski inequalities' and 'characterizing OT maps' but does not indicate where the full statements of these results appear or whether they include explicit proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for emphasizing the role of the identifiability assumption. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The reduction of the multimarginal OT problem to discrete transport over mixture components (with claimed linear scaling in sample size) and the extension of Wasserstein metrics both explicitly require identifiability of the finite location-scale mixtures to ensure unique OT maps between components. No theorem or set of sufficient conditions (e.g., minimum mean separation, covariance bounds, or dimension-dependent criteria) is provided to guarantee identifiability in the multivariate setting used for the domain-adaptation application.

Authors: We agree that the manuscript assumes identifiability without supplying explicit sufficient conditions for the multivariate location-scale case. This is a fair observation. In the revised version we will add a short subsection (or appendix remark) that states verifiable sufficient conditions, for example requiring the component means to satisfy a minimum separation ||μ_i - μ_j|| ≥ C·max{‖Σ_i‖, ‖Σ_j‖} for a dimension-dependent constant C, together with a brief reference to existing identifiability results for Gaussian mixtures that extend directly to the location-scale family. revision: yes
Referee: [Abstract] Abstract: Without verified identifiability, the restricted couplings may fail to achieve the true multimarginal optimum, undermining the equivalence between the continuous OT problem and the discrete component-wise transport. This assumption is load-bearing for both the theoretical claims and the complexity reduction.

Authors: The referee is correct that identifiability is essential for the claimed equivalence and for the linear-complexity reduction to be exact. The manuscript already qualifies all statements with the phrase “under the assumption of identifiability.” Adding the sufficient conditions mentioned above will allow readers to check when the assumption holds in practice. In the domain-adaptation experiments the fitted mixtures satisfy clear mean separation, so the numerical results remain valid; we will also insert a sentence clarifying that the complexity claim is conditional on identifiability. revision: yes

Circularity Check

0 steps flagged

No circularity: framework relies on explicit identifiability assumption and Minkowski-based definitions without self-referential reduction

full rationale

The derivation introduces Wasserstein-type metrics via generalized Minkowski inequalities and proves the coupling restriction reduces multimarginal OT to discrete component transport under an explicit identifiability assumption. No quoted step shows a quantity defined in terms of itself, a fitted parameter renamed as prediction, or a load-bearing claim justified solely by self-citation. The identifiability condition is stated as a prerequisite rather than derived from the result, and the computational reduction is presented as a theorem under that assumption. This matches the default expectation of a self-contained mathematical construction without circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of identifiability for the location-scale mixtures and on the characterization of OT maps between multivariate location-scale families; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Identifiability of the finite location-scale mixture models
Explicitly invoked in the abstract to extend the metrics and barycenters and to reduce the multimarginal problem.

pith-pipeline@v0.9.1-grok · 5682 in / 1215 out tokens · 40384 ms · 2026-06-26T04:33:50.440784+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 25 canonical work pages

[1]

SIAM Journal on Mathematical Analysis , author =

M. Agueh and G. Carlier,Barycenters in the Wasserstein space, SIAM J. Math. Anal., 43 (2011), pp. 904–924, https://doi.org/10.1137/100805741

work page doi:10.1137/100805741 2011
[2]

Anderes, S

E. Anderes, S. Borgw ardt, and J. Miller,Discrete Wasserstein barycenters: Optimal transport for discrete data, Math. Methods Oper. Res., 84 (2016), pp. 389–409, https: //doi.org/10.1007/s00186-016-0549-x

work page doi:10.1007/s00186-016-0549-x 2016
[3]

Machine Learning , author =

S. Ben-Da vid, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. V aughan, A theory of learning from different domains, Mach. Learn., 79 (2010), pp. 151–175, https: //doi.org/10.1007/s10994-009-5152-4

work page doi:10.1007/s10994-009-5152-4 2010
[4]

IEEE Transactions on Pattern Analysis and Machine Intelligence , author =

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy,Optimal transport for domain adaptation, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), pp. 1853–1865, https://doi.org/10.1109/TPAMI.2016.2615921

work page doi:10.1109/tpami.2016.2615921 2017
[5]

J. A. Cuesta-Albertos, L. Rüschendorf, and A. Tuero-Diaz,Optimal coupling of multivariate distributions and stochastic processes, J. Multivar. Anal., 46 (1993), pp. 335– 361, https://doi.org/10.1006/jmva.1993.1064

work page doi:10.1006/jmva.1993.1064 1993
[6]

Delon and A

J. Delon and A. Desolneux,A Wasserstein-type distance in the space of Gaussian mixture models, SIAM J. Img. Sci., 13 (2020), pp. 936–970, https://doi.org/10.1137/19M1301047

work page doi:10.1137/19m1301047 2020
[7]

A. P. Dempster, N. M. Laird, and D. B. Rubin,Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.: B (Methodol.), 39 (1977), pp. 1–22, https: //doi.org/10.1111/j.2517-6161.1977.tb01600.x

work page doi:10.1111/j.2517-6161.1977.tb01600.x 1977
[8]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unter thiner, M. Dehghani, M. Minderer, G. Heigold, S. Gell y, J. Uszkoreit, and N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, in Int. Conf. Learn. Represent. ICLR, 2021, https://openreview.net/forum?id=YicbFdNTTy

2021
[9]

D. C. Dowson and B. V. Landau,The Fréchet distance between multivariate normal distri- butions, J. Multivar. Anal., 12 (1982), pp. 450–455, https://doi.org/10.1016/0047-259X(82) 90077-X

work page doi:10.1016/0047-259x(82 1982
[10]

Dusson, V

G. Dusson, V. Ehrlacher, and N. Nouaime,A Wasserstein-type metric for generic mixture models, including location-scatter and group invariant measures, ESAIM: Control Optim. Calc. Var., 32 (2026), p. 19, https://doi.org/10.1051/cocv/2026004

work page doi:10.1051/cocv/2026004 2026
[11]

El Hamri, Y

M. El Hamri, Y. Bennani, and I. F alih,Hierarchical optimal transport for unsupervised domain adaptation, Mach. Learn., 111 (2022), pp. 4159–4182, https://doi.org/10.1007/ s10994-022-06231-7

2022
[12]

Frühwirth-Schnatter,Finite Mixture and Markov Switching Models, Springer Se- ries in Statistics, Springer New York, New York, NY, 2006, https://doi.org/10.1007/ 978-0-387-35768-3

S. Frühwirth-Schnatter,Finite Mixture and Markov Switching Models, Springer Se- ries in Statistics, Springer New York, New York, NY, 2006, https://doi.org/10.1007/ 978-0-387-35768-3

2006
[13]

Ganin, E

Y. Ganin, E. Ustinov a, H. Ajakan, P. Germain, H. Larochelle, F. La violette, M. Marchand, and V. Lempitsky,Domain-adversarial training of neural networks, J. Mach. Learn. Res., 17 (2016), pp. 1–35, http://jmlr.org/papers/v17/15-239.html

2016
[14]

G. H. Hardy, J. E. Littlewood, and G. Pólya,Inequalities, Cambridge University Press, 1964

1964
[15]

K. He, X. Zhang, S. Ren, and J. Sun,Deep residual learning for image recognition, in 2016 IEEE Conf. Comput. Vis. Pattern Recognit. CVPR, June 2016, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[16]

Holzmann, A

H. Holzmann, A. Munk, and T. Gneiting,Identifiability of finite mixtures of elliptical distributions, Scand. J. Stat., 33 (2006), pp. 753–763, https://doi.org/10.1111/j.1467-9469. 2006.00505.x

work page doi:10.1111/j.1467-9469 2006
[17]

E. E. Kummer,De integralibus quibusdam definitis et seriebus infinitis, J. Reine Angew. Math., 17 (1837), pp. 228–242, https://doi.org/10.1515/crll.1837.17.228

work page doi:10.1515/crll.1837.17.228
[18]

Probability Theory and Related Fields , author =

T. Le Gouic and J.-M. Loubes,Existence and consistency of Wasserstein barycenters, Probab. Theory Relat. Fields, 168 (2017), pp. 901–917, https://doi.org/10.1007/s00440-016-0727-z

work page doi:10.1007/s00440-016-0727-z 2017
[19]

D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales,Deeper, broader and artier domain LSMM-OTDA33 generalization, in 2017 IEEE Int. Conf. Comput. Vis. ICCV, Oct. 2017, pp. 5543–5551, https://doi.org/10.1109/ICCV.2017.591

work page doi:10.1109/iccv.2017.591 2017
[20]

Matkowski,The converse of the Minkowski’s inequality theorem and its general- ization, Proc

J. Matkowski,The converse of the Minkowski’s inequality theorem and its general- ization, Proc. Am. Math. Soc., 109 (1990), pp. 663–675, https://doi.org/10.1090/ S0002-9939-1990-1009994-0

1990
[21]

Finite mixture models , url =

G. McLachlan and D. Peel,Finite Mixture Models, Wiley Series in Probability and Statistics, Wiley, 2000, https://doi.org/10.1002/0471721182

work page doi:10.1002/0471721182 2000
[22]

E. F. Montesuma and F. M. N. Mboula,Wasserstein barycenter for multi-source domain adaptation, in 2021 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR, June 2021, pp. 16780–16788, https://doi.org/10.1109/CVPR46437.2021.01651

work page doi:10.1109/cvpr46437.2021.01651 2021
[23]

E. F. Montesuma, F. M. N. Mboula, and A. Souloumiac,Optimal transport for domain adaptation through Gaussian mixture models, Trans. Mach. Learn. Res., (2025), https: //openreview.net/forum?id=DCAeXwLenB

2025
[24]

X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. W ang,Moment matching for multi-source domain adaptation, in 2019 IEEECVF Int. Conf. Comput. Vis. ICCV, Oct. 2019, pp. 1406–1415, https://doi.org/10.1109/ICCV.2019.00149

work page doi:10.1109/iccv.2019.00149 2019
[25]

X. Peng, B. Usman, N. Kaushik, D. W ang, J. Hoffman, and K. Saenko,VisDA: A synthetic-to-real benchmark for visual domain adaptation, in 2018 IEEECVF Conf. Comput. Vis. Pattern Recognit. Workshop CVPRW, June 2018, pp. 2021–2026, https: //doi.org/10.1109/CVPRW.2018.00271

work page doi:10.1109/cvprw.2018.00271 2018
[26]

Computational optimal transport.Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019

G. Peyré and M. Cuturi,Computational optimal transport: With applications to data science, Found. Trends®Mach. Learn., 11 (2019), pp. 355–607, https://doi.org/10.1561/2200000073

work page doi:10.1561/2200000073 2019
[27]

Yu. V. Prokhorov,Convergence of random processes and limit theorems in probability theory, Theory Probab. Appl., 1 (1956), pp. 157–214, https://doi.org/10.1137/1101016

work page doi:10.1137/1101016 1956
[28]

Quiñonero-Candela, M

J. Quiñonero-Candela, M. Sugiy ama, A. Schw aighofer, and N. D. La wrence,Dataset Shift in Machine Learning, The MIT Press, Dec. 2008, https://doi.org/10.7551/mitpress/ 9780262170055.001.0001

work page doi:10.7551/mitpress/ 2008
[29]

S. T. Rachev and L. Rüschendorf,Mass Transportation Problems Volume 1: Theory, Probability and Its Applications, Springer-Verlag, New York, 1998, https://doi.org/10.1007/ b98893

1998
[30]

Redko, N

I. Redko, N. Cour ty, R. Flamar y, and D. Tuia,Optimal transport for multi-source domain adaptation under target shift, in Proc. 22nd Int. Conf. Artif. Intell. Stat. AISTATS, vol. 89 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 849–858

2019
[31]

Sugiy ama, M

M. Sugiy ama, M. Krauleda t, and K.-R. Müller,Covariate shift adaptation by importance weighted cross validation, J. Mach. Learn. Res., 8 (2007), pp. 985–1005, https://doi.org/10. 5555/1314498.1390324

arXiv 2007
[32]

T aka tsu,Wasserstein geometry of Gaussian measures, Osaka J

A. T aka tsu,Wasserstein geometry of Gaussian measures, Osaka J. Math., 48 (2011), pp. 1005– 1026, https://projecteuclid.org/journals/osaka-journal-of-mathematics/volume-48/issue-4/ Wasserstein-geometry-of-Gaussian-measures/ojm/1326291215.full

arXiv 2011
[33]

V. N. V apnik,The Nature of Statistical Learning Theory, Springer, New York, NY, 2000, https://doi.org/10.1007/978-1-4757-3264-1

work page doi:10.1007/978-1-4757-3264-1 2000
[34]

Springer Science & Business Media, 2008

C. Villani,Optimal Transport, vol. 338 of Grundlehren Der Mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 2009, https://doi.org/10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009
[35]

S. J. Yakowitz and J. D. Spragins,On the identifiability of finite mixtures, Ann. Math. Stat., 39 (1968), pp. 209–214, https://doi.org/10.1214/aoms/1177698520

work page doi:10.1214/aoms/1177698520 1968
[36]

R. J. Zimmer,Ergodic Theory and Semisimple Groups, Birkhäuser, Boston, MA, 1984, https: //doi.org/10.1007/978-1-4684-9488-4

work page doi:10.1007/978-1-4684-9488-4 1984

[1] [1]

SIAM Journal on Mathematical Analysis , author =

M. Agueh and G. Carlier,Barycenters in the Wasserstein space, SIAM J. Math. Anal., 43 (2011), pp. 904–924, https://doi.org/10.1137/100805741

work page doi:10.1137/100805741 2011

[2] [2]

Anderes, S

E. Anderes, S. Borgw ardt, and J. Miller,Discrete Wasserstein barycenters: Optimal transport for discrete data, Math. Methods Oper. Res., 84 (2016), pp. 389–409, https: //doi.org/10.1007/s00186-016-0549-x

work page doi:10.1007/s00186-016-0549-x 2016

[3] [3]

Machine Learning , author =

S. Ben-Da vid, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. V aughan, A theory of learning from different domains, Mach. Learn., 79 (2010), pp. 151–175, https: //doi.org/10.1007/s10994-009-5152-4

work page doi:10.1007/s10994-009-5152-4 2010

[4] [4]

IEEE Transactions on Pattern Analysis and Machine Intelligence , author =

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy,Optimal transport for domain adaptation, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), pp. 1853–1865, https://doi.org/10.1109/TPAMI.2016.2615921

work page doi:10.1109/tpami.2016.2615921 2017

[5] [5]

J. A. Cuesta-Albertos, L. Rüschendorf, and A. Tuero-Diaz,Optimal coupling of multivariate distributions and stochastic processes, J. Multivar. Anal., 46 (1993), pp. 335– 361, https://doi.org/10.1006/jmva.1993.1064

work page doi:10.1006/jmva.1993.1064 1993

[6] [6]

Delon and A

J. Delon and A. Desolneux,A Wasserstein-type distance in the space of Gaussian mixture models, SIAM J. Img. Sci., 13 (2020), pp. 936–970, https://doi.org/10.1137/19M1301047

work page doi:10.1137/19m1301047 2020

[7] [7]

A. P. Dempster, N. M. Laird, and D. B. Rubin,Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.: B (Methodol.), 39 (1977), pp. 1–22, https: //doi.org/10.1111/j.2517-6161.1977.tb01600.x

work page doi:10.1111/j.2517-6161.1977.tb01600.x 1977

[8] [8]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unter thiner, M. Dehghani, M. Minderer, G. Heigold, S. Gell y, J. Uszkoreit, and N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, in Int. Conf. Learn. Represent. ICLR, 2021, https://openreview.net/forum?id=YicbFdNTTy

2021

[9] [9]

D. C. Dowson and B. V. Landau,The Fréchet distance between multivariate normal distri- butions, J. Multivar. Anal., 12 (1982), pp. 450–455, https://doi.org/10.1016/0047-259X(82) 90077-X

work page doi:10.1016/0047-259x(82 1982

[10] [10]

Dusson, V

G. Dusson, V. Ehrlacher, and N. Nouaime,A Wasserstein-type metric for generic mixture models, including location-scatter and group invariant measures, ESAIM: Control Optim. Calc. Var., 32 (2026), p. 19, https://doi.org/10.1051/cocv/2026004

work page doi:10.1051/cocv/2026004 2026

[11] [11]

El Hamri, Y

M. El Hamri, Y. Bennani, and I. F alih,Hierarchical optimal transport for unsupervised domain adaptation, Mach. Learn., 111 (2022), pp. 4159–4182, https://doi.org/10.1007/ s10994-022-06231-7

2022

[12] [12]

Frühwirth-Schnatter,Finite Mixture and Markov Switching Models, Springer Se- ries in Statistics, Springer New York, New York, NY, 2006, https://doi.org/10.1007/ 978-0-387-35768-3

S. Frühwirth-Schnatter,Finite Mixture and Markov Switching Models, Springer Se- ries in Statistics, Springer New York, New York, NY, 2006, https://doi.org/10.1007/ 978-0-387-35768-3

2006

[13] [13]

Ganin, E

Y. Ganin, E. Ustinov a, H. Ajakan, P. Germain, H. Larochelle, F. La violette, M. Marchand, and V. Lempitsky,Domain-adversarial training of neural networks, J. Mach. Learn. Res., 17 (2016), pp. 1–35, http://jmlr.org/papers/v17/15-239.html

2016

[14] [14]

G. H. Hardy, J. E. Littlewood, and G. Pólya,Inequalities, Cambridge University Press, 1964

1964

[15] [15]

K. He, X. Zhang, S. Ren, and J. Sun,Deep residual learning for image recognition, in 2016 IEEE Conf. Comput. Vis. Pattern Recognit. CVPR, June 2016, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[16] [16]

Holzmann, A

H. Holzmann, A. Munk, and T. Gneiting,Identifiability of finite mixtures of elliptical distributions, Scand. J. Stat., 33 (2006), pp. 753–763, https://doi.org/10.1111/j.1467-9469. 2006.00505.x

work page doi:10.1111/j.1467-9469 2006

[17] [17]

E. E. Kummer,De integralibus quibusdam definitis et seriebus infinitis, J. Reine Angew. Math., 17 (1837), pp. 228–242, https://doi.org/10.1515/crll.1837.17.228

work page doi:10.1515/crll.1837.17.228

[18] [18]

Probability Theory and Related Fields , author =

T. Le Gouic and J.-M. Loubes,Existence and consistency of Wasserstein barycenters, Probab. Theory Relat. Fields, 168 (2017), pp. 901–917, https://doi.org/10.1007/s00440-016-0727-z

work page doi:10.1007/s00440-016-0727-z 2017

[19] [19]

D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales,Deeper, broader and artier domain LSMM-OTDA33 generalization, in 2017 IEEE Int. Conf. Comput. Vis. ICCV, Oct. 2017, pp. 5543–5551, https://doi.org/10.1109/ICCV.2017.591

work page doi:10.1109/iccv.2017.591 2017

[20] [20]

Matkowski,The converse of the Minkowski’s inequality theorem and its general- ization, Proc

J. Matkowski,The converse of the Minkowski’s inequality theorem and its general- ization, Proc. Am. Math. Soc., 109 (1990), pp. 663–675, https://doi.org/10.1090/ S0002-9939-1990-1009994-0

1990

[21] [21]

Finite mixture models , url =

G. McLachlan and D. Peel,Finite Mixture Models, Wiley Series in Probability and Statistics, Wiley, 2000, https://doi.org/10.1002/0471721182

work page doi:10.1002/0471721182 2000

[22] [22]

E. F. Montesuma and F. M. N. Mboula,Wasserstein barycenter for multi-source domain adaptation, in 2021 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR, June 2021, pp. 16780–16788, https://doi.org/10.1109/CVPR46437.2021.01651

work page doi:10.1109/cvpr46437.2021.01651 2021

[23] [23]

E. F. Montesuma, F. M. N. Mboula, and A. Souloumiac,Optimal transport for domain adaptation through Gaussian mixture models, Trans. Mach. Learn. Res., (2025), https: //openreview.net/forum?id=DCAeXwLenB

2025

[24] [24]

X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. W ang,Moment matching for multi-source domain adaptation, in 2019 IEEECVF Int. Conf. Comput. Vis. ICCV, Oct. 2019, pp. 1406–1415, https://doi.org/10.1109/ICCV.2019.00149

work page doi:10.1109/iccv.2019.00149 2019

[25] [25]

X. Peng, B. Usman, N. Kaushik, D. W ang, J. Hoffman, and K. Saenko,VisDA: A synthetic-to-real benchmark for visual domain adaptation, in 2018 IEEECVF Conf. Comput. Vis. Pattern Recognit. Workshop CVPRW, June 2018, pp. 2021–2026, https: //doi.org/10.1109/CVPRW.2018.00271

work page doi:10.1109/cvprw.2018.00271 2018

[26] [26]

Computational optimal transport.Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019

G. Peyré and M. Cuturi,Computational optimal transport: With applications to data science, Found. Trends®Mach. Learn., 11 (2019), pp. 355–607, https://doi.org/10.1561/2200000073

work page doi:10.1561/2200000073 2019

[27] [27]

Yu. V. Prokhorov,Convergence of random processes and limit theorems in probability theory, Theory Probab. Appl., 1 (1956), pp. 157–214, https://doi.org/10.1137/1101016

work page doi:10.1137/1101016 1956

[28] [28]

Quiñonero-Candela, M

J. Quiñonero-Candela, M. Sugiy ama, A. Schw aighofer, and N. D. La wrence,Dataset Shift in Machine Learning, The MIT Press, Dec. 2008, https://doi.org/10.7551/mitpress/ 9780262170055.001.0001

work page doi:10.7551/mitpress/ 2008

[29] [29]

S. T. Rachev and L. Rüschendorf,Mass Transportation Problems Volume 1: Theory, Probability and Its Applications, Springer-Verlag, New York, 1998, https://doi.org/10.1007/ b98893

1998

[30] [30]

Redko, N

I. Redko, N. Cour ty, R. Flamar y, and D. Tuia,Optimal transport for multi-source domain adaptation under target shift, in Proc. 22nd Int. Conf. Artif. Intell. Stat. AISTATS, vol. 89 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 849–858

2019

[31] [31]

Sugiy ama, M

M. Sugiy ama, M. Krauleda t, and K.-R. Müller,Covariate shift adaptation by importance weighted cross validation, J. Mach. Learn. Res., 8 (2007), pp. 985–1005, https://doi.org/10. 5555/1314498.1390324

arXiv 2007

[32] [32]

T aka tsu,Wasserstein geometry of Gaussian measures, Osaka J

A. T aka tsu,Wasserstein geometry of Gaussian measures, Osaka J. Math., 48 (2011), pp. 1005– 1026, https://projecteuclid.org/journals/osaka-journal-of-mathematics/volume-48/issue-4/ Wasserstein-geometry-of-Gaussian-measures/ojm/1326291215.full

arXiv 2011

[33] [33]

V. N. V apnik,The Nature of Statistical Learning Theory, Springer, New York, NY, 2000, https://doi.org/10.1007/978-1-4757-3264-1

work page doi:10.1007/978-1-4757-3264-1 2000

[34] [34]

Springer Science & Business Media, 2008

C. Villani,Optimal Transport, vol. 338 of Grundlehren Der Mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 2009, https://doi.org/10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009

[35] [35]

S. J. Yakowitz and J. D. Spragins,On the identifiability of finite mixtures, Ann. Math. Stat., 39 (1968), pp. 209–214, https://doi.org/10.1214/aoms/1177698520

work page doi:10.1214/aoms/1177698520 1968

[36] [36]

R. J. Zimmer,Ergodic Theory and Semisimple Groups, Birkhäuser, Boston, MA, 1984, https: //doi.org/10.1007/978-1-4684-9488-4

work page doi:10.1007/978-1-4684-9488-4 1984