Recognition: 2 theorem links
· Lean TheoremConvergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models
Pith reviewed 2026-05-11 00:57 UTC · model grok-4.3
The pith
Lower bounds link mixture density distances to Wasserstein and scale discrepancies, yielding contraction rates for latent mixing measures in infinite location-scale models with unknown shared scale.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Novel lower bounds are derived that connect the L1 distance between mixture densities to discrepancies based on the Wasserstein distances and the operator norm between the underlying mixing measures and scale matrices. The approach combines the dual formulation of the W1 distance with functional-analytic approximation techniques. General inequalities result whose strength is set by the smoothness of the mixture kernel via the rate of decay of its characteristic function and by a key lower-bound on the L1 metric involving the operator norm discrepancy between scale parameters. A novel PDE inversion condition yields a sharper inequality for important ordinary-smooth cases. These bounds are put
What carries the argument
Lower bounds relating L1 density distance to Wasserstein-1 distance on mixing measures and operator-norm discrepancy on scales, obtained from the W1 dual and a PDE inversion condition.
If this is right
- Contraction rates are obtained for Dirichlet process mixtures with unknown shared scale parameter.
- Location mixing measure and scale parameter can converge at different rates depending on the kernel.
- The bounds specialize to multivariate Gaussian, Cauchy, and Laplace kernels.
- General inequalities hold for kernels whose characteristic function decay determines the rate.
Where Pith is reading between the lines
- The separation of location and scale rates could guide the choice of priors that balance estimation of both in practice.
- Similar bounding techniques might apply to finite mixtures or to models with component-specific scales if the operator-norm control extends.
- The PDE inversion condition may be checkable numerically for other kernels to obtain explicit rates without new proofs.
Load-bearing premise
The kernel must be sufficiently smooth for its characteristic function to decay fast enough that the L1 density distance controls the mixing-measure discrepancies, and the PDE inversion condition must hold to obtain the sharp rates.
What would settle it
A concrete kernel and prior for which the posterior of the mixing measure fails to contract at the derived rate while the density still contracts in L1.
read the original abstract
We study posterior contraction rates for mixing measures in homoscedastic location-scale mixture models with infinitely many components. While posterior convergence at the level of densities is well understood, ensuring convergence of the latent mixing measure is more challenging and has remained an open problem in settings where both location and scale parameters are unknown. We address this by deriving novel lower-bounds that connect the $L^1$ distance between mixture densities to discrepancies, based on the Wasserstein distances and the operator norm, between the underlying mixing measures and scale matrices. Our approach combines the dual formulation of the $W_1$ distance with functional-analytic approximation techniques. This leads to general inequalities, whose strength is determined (i) by the smoothness of the mixture kernel via the rate of decay of its characteristic function, and (ii) by a key lower-bound on the $L^1$ metric involving the operator norm discrepancy between scale parameters. Moreover, a novel PDE inversion condition yields a sharper inequality for important ordinary-smooth cases. We specialize these bounds to popular mixtures based on multivariate Gaussian, Cauchy, and Laplace kernels. As a consequence, we obtain first-of-their-kind contraction rates in the context of Dirichlet process mixtures with an unknown scale parameter shared across components. As a byproduct of our inequalities, we can distinguish the convergence behavior of the location mixing measure from that of the scale parameter across a range of kernel choices, leading to nuanced insights into their respective rates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives novel lower bounds relating the L1 distance between mixture densities to Wasserstein-1 and operator-norm discrepancies between the underlying mixing measures (locations and shared scale) in infinite homoscedastic location-scale mixture models. These inequalities are controlled by the decay rate of the kernel's characteristic function, with a novel PDE inversion condition providing sharper bounds in ordinary-smooth cases. Specializing to multivariate Gaussian, Cauchy, and Laplace kernels yields the first posterior contraction rates for the latent mixing measure in Dirichlet process mixtures with unknown shared scale, while distinguishing the rates for the location component versus the scale parameter.
Significance. If the derived bounds and the PDE inversion condition hold, the results close a longstanding gap in Bayesian nonparametric theory by establishing contraction for the mixing measure (rather than just the density) when both location and scale are unknown. The ability to separate location and scale convergence rates across kernels is a useful byproduct. The approach via dual formulations of W1 combined with functional-analytic techniques is technically sound and extends prior work on density contraction; the provision of explicit rates for standard kernels strengthens applicability.
major comments (2)
- [PDE inversion condition and its application to ordinary-smooth kernels] The PDE inversion condition is load-bearing for the sharper ordinary-smooth rates claimed for Cauchy and Laplace kernels (see the statement following the general L1-to-discrepancy inequality and its application in the specialization section). The manuscript must explicitly verify that this condition holds for these kernels under the paper's assumptions on the support of the mixing measure; without such verification, the claimed distinction between location and scale rates does not follow at the advertised speed.
- [General inequalities and specialization to Gaussian/Cauchy/Laplace] The transfer from density contraction to mixing-measure contraction relies on the L1-to-W1/operator-norm inequalities being sufficiently strong. The abstract indicates these are controlled by characteristic-function decay plus the PDE step; the paper should include a self-contained check that the resulting rates remain valid when the mixing measure has unbounded support (common in DP mixtures), as this affects the operator-norm term.
minor comments (2)
- [Notation and setup] Notation for the shared scale parameter (treated as a matrix in the operator-norm discrepancy) should be clarified early, especially when specializing from univariate to multivariate kernels.
- [Introduction] The abstract mentions 'scale matrices' but the model is described as homoscedastic with a shared scalar scale; a brief remark reconciling these would aid readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the constructive comments, which help clarify the presentation of our results on contraction rates for mixing measures in homoscedastic location-scale mixtures. We address each major comment below.
read point-by-point responses
-
Referee: [PDE inversion condition and its application to ordinary-smooth kernels] The PDE inversion condition is load-bearing for the sharper ordinary-smooth rates claimed for Cauchy and Laplace kernels (see the statement following the general L1-to-discrepancy inequality and its application in the specialization section). The manuscript must explicitly verify that this condition holds for these kernels under the paper's assumptions on the support of the mixing measure; without such verification, the claimed distinction between location and scale rates does not follow at the advertised speed.
Authors: We agree that an explicit verification of the PDE inversion condition for the Cauchy and Laplace kernels is necessary to fully substantiate the sharper rates and the resulting distinction between location and scale convergence. Our general framework derives the condition from the decay properties of the characteristic function, but the manuscript does not contain a dedicated check tailored to these kernels under the stated support assumptions. In the revised version we will add a self-contained verification (as a new lemma or appendix subsection) confirming that the condition holds for the multivariate Cauchy and Laplace kernels when the mixing measure satisfies the paper's assumptions, thereby justifying the advertised rates. revision: yes
-
Referee: [General inequalities and specialization to Gaussian/Cauchy/Laplace] The transfer from density contraction to mixing-measure contraction relies on the L1-to-W1/operator-norm inequalities being sufficiently strong. The abstract indicates these are controlled by characteristic-function decay plus the PDE step; the paper should include a self-contained check that the resulting rates remain valid when the mixing measure has unbounded support (common in DP mixtures), as this affects the operator-norm term.
Authors: We acknowledge the need for an explicit check on unbounded support, which is standard for Dirichlet process mixtures. The general L1-to-discrepancy inequalities are formulated via the dual representation of W1 and functional-analytic arguments that do not require bounded support a priori; however, the operator-norm term does require care when moments are controlled only through the prior. In the revision we will insert a self-contained remark (immediately after the statement of the main inequalities) that verifies the rates remain valid for unbounded mixing measures under the mild integrability conditions already implicit in the DP prior and the kernel assumptions, with particular attention to the operator-norm contribution. revision: yes
Circularity Check
No significant circularity; derivations rely on independent functional-analytic bounds
full rationale
The paper's core contribution consists of new inequalities linking L1 distances between mixture densities to Wasserstein and operator-norm discrepancies on mixing measures, obtained via the dual formulation of W1 combined with functional approximation techniques. The novel PDE inversion condition is presented as an original lower-bound tool derived for sharpening ordinary-smooth cases and is then applied to specific kernels; nothing in the abstract or described chain indicates that this condition is smuggled in via self-citation, defined circularly in terms of the target rates, or obtained by fitting parameters to the data being predicted. Specialization to Gaussian, Cauchy, and Laplace kernels follows directly from the general bounds once the decay rates of their characteristic functions are inserted. No self-definitional steps, fitted-input predictions, or renaming of known empirical patterns appear. The derivation chain is therefore self-contained against external analytic tools and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The mixture kernel has a characteristic function with certain decay rate determining smoothness.
- ad hoc to paper A PDE inversion condition holds for ordinary-smooth cases.
Reference graph
Works this paper leans on
-
[1]
Ascolani, F., Lijoi, A., Rebaudo, G., and Zanella, G. (2023). Clustering consistency with Dirichlet process mixtures . Biometrika , 110(2):551--558
2023
- [2]
-
[3]
Barron, A., Schervish, M., and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. The Annals of Statistics , 27:536--561
1999
-
[4]
Butzer, P. L. and Nessel, R. J. (1971). Fourier analysis and approximation, Vol. 1 , volume 7. Birkh \"a user Basel
1971
-
[5]
Caillerie, C., Chazal, F., Dedecker, J., and Michel, B. (2011). Deconvolution for the W asserstein metric and geometric inference. Electronic Journal of Statistics , 5:1394--1423
2011
-
[6]
and De Blasi, P
Canale, A. and De Blasi, P. (2017). Posterior asymptotics of nonparametric location-scale mixtures for multivariate density estimation. Bernoulli , 23(1):379--404
2017
-
[7]
Castillo, I. (2024). Bayesian Nonparametric Statistics . École d’Été de Probabilités de Saint-Flour LI - 2023. Springer Nature
2024
-
[8]
Catalano, M. and Lavenant, H. (2025). Measures of dependence based on W asserstein distances. arXiv preprint arXiv:2510.06034
-
[9]
Catalano, M., Lavenant, H., Lijoi, A., and Pr \"u nster, I. (2024). A W asserstein index of dependence for random measures. Journal of the American Statistical Association , 119(547):2396--2406
2024
-
[10]
Catalano, M., Lijoi, A., and Pr \"u nster, I. (2021). Measuring dependence in the Wasserstein distance for Bayesian nonparametric models . The Annals of Statistics , 49(5):2916--2947
2021
-
[11]
and Walker, S
Chae, M. and Walker, S. G. (2017). A novel approach to Bayesian consistency . Electronic Journal of Statistics , 11(2):4723--4745
2017
-
[12]
H., Pr \"u nster, I., and Ruggiero, M
De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Pr \"u nster, I., and Ruggiero, M. (2015). Are Gibbs -type priors the most natural generalization of the Dirichlet process? IEEE Transactions on Pattern Analysis and Machine Intelligence , 37(2):212--229
2015
-
[13]
Dedecker, J., Fischer, A., and Michel, B. (2015). Improved rates for W asserstein deconvolution with ordinary smooth error in dimension one. Electronic Journal of Statistics , 9(1):234--265
2015
-
[14]
and Michel, B
Dedecker, J. and Michel, B. (2013). Minimax rates of convergence for W asserstein deconvolution with supersmooth errors in any dimension. Journal of Multivariate Analysis , 122:278--291
2013
-
[15]
DeVore, R. A. and Lorentz, G. G. (1993). Constructive approximation , volume 303. Springer Science & Business Media
1993
-
[16]
Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association , 90(430):577--588
1995
-
[17]
Ferguson, T. (1973). A B ayesian analysis of some nonparametric problems. The Annals of Statistics , 1:209--230
1973
-
[18]
Ferguson, T. S. (1983). Bayesian density estimation by mixtures of normal distributions. In Recent advances in statistics , pages 287--302. Elsevier
1983
-
[19]
Folland, G. B. (1999). Real analysis: modern techniques and their applications . John Wiley & Sons
1999
-
[20]
and Raftery, A
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association , 97(458):611--631
2002
-
[22]
and van der Vaart, A
Gao, F. and van der Vaart, A. (2016). Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures. Electronic Journal of Statistics , 10(1):608--627
2016
-
[23]
Genovese, C. R. and Wasserman, L. (2000). Rates of convergence for the Gaussian mixture sieve . The Annals of Statistics , 28(4):1105--1127
2000
-
[24]
K., and Ramamoorthi, R
Ghosal, S., Ghosh, J. K., and Ramamoorthi, R. V. (1999). Posterior consistency of D irichlet mixtures in density estimation. The Annals of Statistics , 27:143--158
1999
-
[25]
K., and van der Vaart, A
Ghosal, S., Ghosh, J. K., and van der Vaart, A. (2000). Convergence rates of posterior distributions. The Annals of Statistics , 28:500--531
2000
-
[26]
and van der Vaart, A
Ghosal, S. and van der Vaart, A. (2007a). Convergence rates of posterior distributions for noniid observations . The Annals of Statistics , 35(1):192 -- 223
-
[27]
and van der Vaart, A
Ghosal, S. and van der Vaart, A. (2007b). Posterior convergence rates of D irichlet mixtures at smooth densities. The Annals of Statistics , 35:697--723
-
[28]
and Van der Vaart, A
Ghosal, S. and Van der Vaart, A. (2017). Fundamentals of nonparametric Bayesian inference , volume 44. Cambridge University Press
2017
-
[29]
and van der Vaart, A
Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and B ayes estimation for mixtures of normal densities. The Annals of Statistics , 29(5):1233--1263
2001
-
[30]
and Nickl, R
Gin \'e , E. and Nickl, R. (2011). Rates of contraction for posterior distributions in L^r -metrics, 1 r . The Annals of Statistics , 39(6):2883 -- 2911
2011
-
[31]
and Pitman, J
Gnedin, A. and Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles . Journal of Mathematical Sciences , 138(3):5674--5685
2006
-
[32]
Guha, A., Ho, N., and Nguyen, X. (2021). On posterior contraction of parameters and interpretability in Bayesian mixture modeling. Bernoulli , 27(4):2159--2188
2021
-
[33]
Heinonen, J. (2001). Lectures on analysis on metric spaces . Springer Science & Business Media
2001
-
[34]
and Nguyen, X
Ho, N. and Nguyen, X. (2016a). Convergence rates of parameter estimation for some weakly identifiable finite mixtures. The Annals of Statistics , 44:2726--2755
-
[35]
and Nguyen, X
Ho, N. and Nguyen, X. (2016b). On strong identifiability and convergence rates of parameter estimation in finite mixtures. Electronic Journal of Statistics , 10:271--307
-
[36]
Kingman, J. F. (1978). The representation of partition structures. Journal of the London Mathematical Society , 2(2):374--380
1978
-
[37]
Kingman, J. F. C. (1967). Completely random measures. Pacific Journal of Mathematics , 21(1):59--78
1967
-
[38]
Lee, J. M. (2013). Introduction to Smooth Manifolds , volume 218 of Graduate Texts in Mathematics . Springer, New York, 2 edition
2013
-
[39]
H., and Pr \"u nster, I
Lijoi, A., Mena, R. H., and Pr \"u nster, I. (2005a). Hierarchical mixture modeling with normalized inverse- Gaussian priors. Journal of the American Statistical Association , 100(472):1278--1291
-
[40]
Lijoi, A., Nipoti, B., and Pr \"u nster, I. (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli , pages 1260--1291
2014
-
[41]
u nster, I. (2010). Models beyond the Dirichlet process. In Hjort, N. L., Holmes, C., M \
Lijoi, A. and Pr \"u nster, I. (2010). Models beyond the Dirichlet process. In Hjort, N. L., Holmes, C., M \"u ller, P., and Walker, S. G., editors, Bayesian Nonparametrics , pages 80--136. Cambridge University Press
2010
-
[42]
Lijoi, A., Pr \"u nster, I., and Walker, S. G. (2005b). On consistency of nonparametric normal mixtures for B ayesian density estimation. Journal of the American Statistical Association , 100(472):1292--1296
-
[43]
Lo, A. Y. (1984). On a Class of Bayesian Nonparametric Estimates: I. Density Estimates . The Annals of Statistics , 12(1):351--357
1984
-
[44]
MacEachern, S. N. and M \"u ller, P. (1998). Estimating mixture of D irichlet process models. Journal of Computational and Graphical Statistics , 7(2):223--238
1998
-
[45]
Majumdar, S. (1992). On topological support of D irichlet prior. Statistics & Probability Letters , 15(5):381--384
1992
-
[46]
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models . John Wiley & Sons
2000
-
[47]
McShane, E. J. (1934). Extension of range of functions. Bulletin of the American Mathematical Society , 40:837--842
1934
-
[48]
Miller, J. W. and Harrison, M. T. (2013). A simple example of D irichlet process mixture inconsistency for the number of components. In Advances in Neural Information Processing Systems , volume 26
2013
-
[49]
Miller, J. W. and Harrison, M. T. (2014). Inconsistency of P itman-- Y or process mixtures for the number of components. Journal of Machine Learning Research , 15:3333--3370
2014
-
[50]
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models . Journal of Computational and Graphical Statistics , 9(2):249--265
2000
-
[51]
Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. The Annals of Statistics , 4(1):370--400
2013
-
[52]
E., Pr \"u nster, I., and Walker, S
Nieto-Barajas, L. E., Pr \"u nster, I., and Walker, S. G. (2004). Normalized random measures driven by increasing additive processes. The Annals of Statistics , 32(6):2343--2360
2004
-
[53]
Perman, M., Pitman, J., and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Probability Theory and Related Fields , 92(1):21--39
1992
-
[54]
Petralia, F., Rao, V., and Dunson, D. (2012). Repulsive mixtures. Advances in neural information processing systems , 25
2012
-
[55]
Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. In Statistics, Probability and Game Theory: Papers in Honor of David Blackwell , volume 30, pages 245--267. Institute of Mathematical Statistics
1996
-
[56]
and Yor, M
Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability , 25(2):855--900
1997
-
[57]
Regazzini, E., Lijoi, A., and Pr \"u nster, I. (2003). Distributional results for means of normalized random measures with independent increments. The Annals of Statistics , 31(2):560--585
2003
-
[58]
B., and Gelfand, A
Rodriguez, A., Dunson, D. B., and Gelfand, A. E. (2008). The nested D irichlet process. Journal of the American Statistical Association , 103(483):1131--1154
2008
-
[59]
and Mengersen, K
Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society Series B: Statistical Methodology , 73(5):689--710
2011
-
[60]
and Scricciolo, C
Rousseau, J. and Scricciolo, C. (2024). Wasserstein convergence in Bayesian and frequentist deconvolution models . The Annals of Statistics , 52(4):1691--1715
2024
-
[61]
Rudin, W. (1991). Functional Analysis . McGraw-Hill, New York, second edition
1991
-
[62]
Schwartz, L. (1966). Th \'e orie des distributions . Hermann, Paris
1966
-
[63]
Scricciolo, C. (2011). Posterior rates of convergence for Dirichlet mixtures of exponential power densities . Electronic Journal of Statistics , 5:270--308
2011
-
[64]
Scricciolo, C. (2014). Adaptive Bayesian density estimation in Lp-metrics with Pitman-Yor or normalized inverse-Gaussian process kernel mixtures . Bayesian Analysis , 9(2):475--520
2014
-
[65]
Scricciolo, C. (2018). Bayes and maximum likelihood for L1 -Wasserstein deconvolution of L aplace mixtures . Statistical Methods & Applications , 27(2):333--362
2018
-
[66]
Sethuraman, J. (1994). A constructive definition of D irichlet priors. Statistica Sinica , 4:639--650
1994
-
[67]
and Wasserman, L
Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. The Annals of Statistics , 29(3):687--714
2001
-
[68]
Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). Hierarchical D irichlet processes. Journal of the American Statistical Association , 101:1566--1581
2006
-
[69]
Tokdar, S. T. (2006). Posterior consistency of D irichlet location-scale mixture of normals in density estimation and regression. Sankhy \=a : The Indian Journal of Statistics , 67(4):90--110
2006
-
[70]
Tokdar, S. T. (2013). Adaptive Bayesian multivariate density estimation with Dirichlet mixtures . Biometrika , 100(3):623--640
2013
-
[71]
Villani, C. (2003). Topics in Optimal Transportation . American Mathematical Society
2003
-
[72]
Villani, C. (2008). Optimal transport: Old and New . Springer
2008
-
[73]
Wade, S. (2023). Bayesian cluster analysis. Philosophical Transactions of the Royal Society A , 381(2247):20220149
2023
-
[74]
Walker, S. G. (2004). New approaches to Bayesian consistency . The Annals of Statistics , 32(5):2028 -- 2043
2004
-
[75]
Walker, S. G. (2007). Sampling the D irichlet mixture model with slices. Communications in Statistics—Simulation and Computation , 36(1):45--54
2007
-
[76]
Walker, S. G. and Hjort, N. L. (2001). On B ayesian consistency. Journal of the Royal Statistical Society, Series B , 63:811--821
2001
-
[77]
G., Lijoi, A., and Pr \"u nster, I
Walker, S. G., Lijoi, A., and Pr \"u nster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models . The Annals of Statistics , 35(2):738--746
2007
-
[78]
Whitney, H. (1934). Analytic extensions of differentiable functions defined in closed sets. Transactions of the American Mathematical Society , 36:63
1934
-
[79]
and Ghosal, S
Wu, Y. and Ghosal, S. (2010). The L_1 -consistency of Dirichlet mixtures in multivariate Bayesian density estimation . Journal of Multivariate Analysis , 101(10):2411--2419
2010
-
[80]
Xu, Y., M \"u ller, P., and Telesca, D. (2016). Bayesian inference for latent biologic structure with determinantal point processes (dpp). Biometrics , 72(3):955--964
2016
-
[81]
Minimax Optimal Rate for Parameter Estimation in Multivariate Deviated Models , volume =
Do, Dat and Nguyen, Huy and Nguyen, Khai and Ho, Nhat , booktitle =. Minimax Optimal Rate for Parameter Estimation in Multivariate Deviated Models , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.